As Metal is closely tied to the GPU, one of its obvious purposes is rendering. But what exactly is rendering? Let’s explore this concept. In this episode, we’ll focus on general principles rather than diving into the specific details of Metal.
DISCLAIMER: Some details here aren’t directly covered in the Metal documentation. They are based on my own experiments and comparisons with OpenGL’s rendering pipeline, which, in my experience, shares general similarities. If you have any feedback or insights, feel free to share!
Let’s begin with some key terminology:
Rendering, or image synthesis, is the process of generating either a photorealistic or non-photorealistic image from a 2D or 3D model through a computer program. It can also refer to calculating visual effects in video editing software to produce the final video output.
There are two main approaches to rendering:
Rasterization is significantly faster than ray tracing, although it doesn’t achieve the same level of quality. Historically, rasterization has been used in most rendering engines. Metal primarily uses rasterization, but it also supports ray tracing to add more detail where needed.
I might write an article about rasterization on the CPU for a deeper dive into its mechanics, but that’s beyond the scope of this episode. As for ray tracing in Metal, we’ll explore that in future episodes, as it’s a bit more complex for beginners.
Let’s start with a general overview of the rendering pipeline. While we could reference Apple’s official diagram, it’s a bit too high-level for our purposes:

I decided to add a bit more detail for better understanding, while still skipping most of the low-level specifics. I’ll focus on the key points that are useful in most tasks. Although I’m not sure what a complete diagram of the Metal rendering pipeline looks like, you could refer to the OpenGL pipeline. I’m confident the overall concepts are quite similar.

This is the first step of the rendering pipeline, where we perform the following tasks:
Yes, we can render multiple instances of the same model and parameters, while managing different behaviors through the shaders.
A more detailed explanation will follow in the 8th episode.
Once we’ve dispatched our draw call, Metal maps the buffers we passed into the vertex shader’s structures. You can assign vertex properties from different buffers to corresponding attributes in the input ([[stage_in]]) structure, or simply pass the buffers directly and handle them using the vertex index ([[vertex_id]]). More details on this will be covered in the 9th episode.
The vertex shader is then invoked for every vertex of every instance, calculating the position in the viewport volume and other necessary parameters. This means:
The output of the vertex shader is a structure that must contain a field marked with the [[position]] attribute, representing the vertex’s position in the viewport. Additionally, you can include any other fields needed for the fragment shader, such as normals or texture coordinates. There’s no need to manually interpolate these values between vertices—Metal takes care of that for you.
After positioning the vertices in the viewport, Metal performs several steps, which are controlled implicitly through the encoder settings, draw call parameters, and pipeline state.
First, Metal assembles primitives based on the type specified in the draw call. Metal supports the following primitive types: points, lines, and triangles, which can also be constructed using line strips and triangle strips. Each primitive type has its own specific use cases and nuances.

Next, these primitives are clipped to fit within the specified viewport volume. Depending on the primitive type, this process works as follows:
During clipping, new outputs are created using linear interpolation.

For triangles, a face culling step can also be applied, which discards triangles based on their orientation (determined by the clockwise or counterclockwise order of their vertices). This is typically used to hide triangles that are naturally invisible due to their orientation relative to the viewer.

The next step is rasterization of the primitives. Since the primitives are already projected, Metal treats them as 2D shapes with an additional depth component. At this stage, it determines whether each pixel lies inside or outside the primitive and interpolates vertex outputs:

The result of rasterization is a fragment, which contains data for each pixel (though, in some cases, multiple fragments per pixel can be generated for multisampling purposes). This data includes:
The fragments from the previous step undergo several processing stages, with the order varying depending on the configuration:
[[stage_in]]), along with any additional parameters you need to pass in. The result of the fragment shader is new values for the attachments (if multiple attachments are used):This is where the final processing of pixel values occurs:
After these operations, the final result is written to the attachment texture.