<Prev | Content | Next>

04. GPU Frame Capture: pt.1 UI

Xcode provides powerful tools for deep diagnostics on Metal's GPU side. In this episode I overview Metal Capture tool, at least its UI side (capturing in code, debugging and profiling will be in next episodes). So I think just to show you what exactly you have, and later - what you can get from it.

How to start capturing

To start capturing - simply press the button with Metal icon or menu Debug->Capture GPU Workload.

NOTE: Capturing on device demands shaders validation to be off.

Scopes

There're several scopes you can capture. Doesn't matter what you choose, the minimal captured entity is a command buffer, so the scope declares only what command buffers will be captured from (start/end of capturing) and how many of these scopes.

  • Frame

    The frame is scoped by presentDrawable. That means all command buffers between two these calls will be captured. That's what you need in most cases with rendering on screen. Keep in mind that if you use caching or something else, you probably need to capture more than one frame. If you have several CAMetalLayer-s, some of them will be captured - you're out of control here.

  • Metal Layer

    Captures presentDrawable scopes for particular Metal layer. It solves the issue you would have just with frames.

    Highlights a selected Metal layer on device when it's selected.

  • Command Queue

    Captures command buffers from a particular command queue (if you have more than one). If your queue has more than one command buffer, but you need to capture all of them, you need to set enough of them to cover everyone. That might be helpful if you have offscreen rendering.

  • Device

    Captures everything on the selected device. Use it if you have several queues with several command buffers. Moreover, sometimes it's the only way to capture on some devices or simulators (unfortunately Xcode isn't as stable as we would like it to be).

On device Profile after replay is available. So you can get some performance insights, but I would recommend to use Instruments for better precision.

As you can see, there's not much control of scope on UI side, but you can set it up in code. I'm going to explain this topic in the next episode.

Overview of captured data

When Xcode has captured enough command buffers (according to your settings), it navigates you to capture results. It pauses your app, so you can continue running it and do another capture or whatever you want.

Summary

This section has an overview of captured command buffers, detected issues, recommendations on how to avoid them, and basic performance insights.

  1. Queue calls hierarchy - provides access to all sections and queue of command buffers. They are presented as a tree, so you can find there particular encoders, their draw or dispatch calls and related data.
  2. Insights list - shows obvious issues, which Xcode detected during capturing.
  3. Insights details - actually is a part of (2), but here you can find more details of selected issue, recommendations how to fix that, and links to related documentation.
  4. Performance overview - doesn't contain many details, but provides some numbers, which could be helpful and often allow to understand issue without diving deeper.

Dependencies

I put several zoom levels of that section view for better understanding.

The diagram shows you all dependencies between your Metal calls, encoders, command buffers and data, used in them. You can find there in convenient form how resources are used in your Metal pipeline, their types, previews (for textures), etc. Convenient to check visually that you use proper buffers or textures between calls.

Performance

This section provides you:

  • performance overview,
  • timeline with most important metrics plots,
  • shader statistics,
  • heat map (shader execution cost for every pixel),
  • cost graph (performance per function line),
  • counters (more detailed metrics).

I'm going to explain this section much more detailed in next episodes, because it's a really big and important topic.

Memory

This section gives you data about memory allocation and type for every resource. Convenient for understanding where you spend your memory and how, and if there's any hanging resources.

Draw/Dispatch Call

This is the most important and powerful section, so I briefly explain what it's about here, but more detailed describe in another episode.

  • Attachments shows textures, attached to the selected encoder, and shows their preview.

  • Geometry shows vertex shader output.

  • Bound Resources provides a list of resources bound to the encoder at Metal call time.

  • All Resources provides a list of all allocated resources (sometimes you can find there something, you forgot to bind).

  • Pipeline Statistics shows statistics for selected call (and a list of similar calls in captured command buffers). Convenient for on-the-fly estimating performance improvements during run-time shaders changing.

  • Performance shows metrics for this particular call. Use it if you need some really deep insights, optimisation.

  • Call Stack shows CPU call stack that causes this Metal call. In most cases you know it yourself, but sometimes there could be very surprising ways to the call.

Conclusion

  • Xcode provides powerful tools for capturing Metal run-time data from GPU.
  • You have several options for capturing scope, but for more control you need to set it up in code.
  • Metal capture gives you lots of information about resources, performance and even allows you to debug your GPU code.

<Prev | Content | Next>