Hello Triangle, Meet Swift! (And Wide Color)

Two triangles rendered with Metal

The colors at left were gamma encoded after interpolation; those on the right were not.

For an iOS developer wanting to get their feet wet with Metal, a natural place to start is Apple’s Hello Triangle demo.

It is truly the “Hello World” of Metal. All it does is render a two-dimensional triangle, whose corners are red, green and blue, into an MTKView. The vertex and fragment shaders are about as simple as you can get. Even so, it’s a great way to start figuring out how the pieces of the pipeline fit together.

The only thing is—it’s written in Objective C.

As a Swift developer, I found myself wishing I could see a version of Hello Triangle in that language. So I decided to convert it to Swift. (The conversion itself was pretty straightforward: You can see the code in this repo.)

To spice things up a little, I also updated the demo to support wide color, which in Apple’s ecosystem means using the Display P3 color space. (Wide color refers to the ability to display colors outside of the traditional gamut, known as sRGB; it’s something I explored in this earlier post.)

Supporting wide color in Hello Triangle is conceptually simple: Instead of setting the vertices to pure red, green and blue as defined in sRGB, set them to the pure red, green and blue as defined in Display P3. On devices that support it, the corners of the triangle will appear brighter and more vivid.

But as a Metal novice, I found it a bit tricky. In MacOS, the MTKView class has a settable colorspace property, which presumably makes things fairly simple—but in iOS, that property isn’t available.

For that reason, it wasn’t immediately clear to me where in the Metal pipeline to make adjustments for wide color support.

I found an answer in this excellent Stack Overflow reply and related blog post. The author explains how to convert Display P3 color values (which range from 0.0 to 1.0, but actually refer to a wider-than-normal color space) to extended sRGB values (which is comparable to normal sRGB except the values can be negative or greater than 1.0) with the help of a matrix transform. The exact math depends on the colorPixelFormat of the MTKView, which determines where the gamma gets applied.

OK, so about gamma: the gist of gamma correction is that color intensities are often passed through a non-linear function before saving an image. Because most images have only 256 luminance levels, and the human eye is very sensitive to changes in dark colors, the gamma function helps store more darks, sacrificing bright intensities. The values are then passed through an inverse function when presenting on a display.

Because gamma encoding is not linear, values that are evenly spaced before encoding (also known as “compression”) won’t be evenly spaced after the encoding. (This blog post has a superb explanation of gamma correction for those who aren’t familiar.)

There’s a lot of implicit gamma encoding and decoding that can happen in the Metal pipeline, and if you manipulate values without knowing which state you’re in, things can get screwed up fast.

As I learned from those earlier blog posts, there are a couple of options for handling gamma when rendering in wide color to a MTKView:

  1. convert your Display P3 color values to their linear (non-encoded) counterpart in sRGB, and allow the MTKView to apply the gamma encoding for you (by choosing pixel format .bgra10_xr_srgb), or
  2. convert the P3 values to linear sRGB and then pre-apply the gamma encoding yourself mathematically, choosing the pixel format .bgra10_xr.

In this demo, this is the difference between converting the left corner’s “extended” color to 1.2249, -0.04203, -0.0196 (which is P3’s reddest red, converted to linear sRGB) and converting it to 1.0930, -0.2267, -0.1501 (P3’s reddest red as sRGB with gamma encoding applied; these are the numbers you would get if you used Apple’s ColorSync utility to convert to sRGB).

While these conversions are probably best done in a shader, I only had three vertices to handle, so I did it in Swift code using matrix math (see below).

After trying options 1 and 2 above, I noticed an interesting difference in the visual results: when I let the MTKView apply gamma compression to my vertex colors (option 1, pictured above at left), the interior of the triangle was much lighter than when I used the technique in option 2 (right).

The issue was this: In option 1, not only were my triangle’s corners being assigned gamma-compressed values, but so were all of the pixels in between.

The way GPUs work is that values in between the defined vertices are computed automatically using a linear interpolation (or, strictly speaking, a barycentric interpolation) before being passed to the fragment shader.

After the interpolation (which occured in linear space), the gamma encoding moved all of the pixels toward lighter intensities (higher numbers, closer to 1.0).

But when I applied gamma encoding to the converted vertex colors “by hand” (option 2) and set the MTKView to the colorPixelFormat of .bgra10_xr, only the corners were gamma encoded, and the interpolation was effectively done in gamma space. The result was a triangle whose corners were the same color as in option 1, but whose interior values were biased toward the dark end, because of the nature of the gamma function described above.

While neither option is necessarily wrong, you might argue that option 1 (interpolating in linear space) seems more natural, because light is additive in linear space.

Some specifics below:

Using this matrix and conversion functions from endavid

private static let linearP3ToLinearSRGBMatrix: matrix_float3x3 = {
    let col1 = float3([1.2249,  -0.2247,  0])
    let col2 = float3([-0.0420,   1.0419,  0])
    let col3 = float3([-0.0197,  -0.0786,  1.0979])
    return matrix_float3x3([col1, col2, col3])
}()

extension float3 {
    var gammaDecoded: float3 {
        let f = {(c: Float) -> Float in
            if abs(c) <= 0.04045 { return c / 12.92 } return sign(c) * powf((abs(c) + 0.055) / 1.055, 2.4) } return float3(f(x), f(y), f(z)) } var gammaEncoded: float3 { let f = {(c: Float) -> Float in
            if abs(c) <= 0.0031308 {
                return c * 12.92
            }
            return sign(c) * (powf(abs(c), 1/2.4) * 1.055 - 0.055)
        }
        return float3 (f(x), f(y), f(z))
     }
}

…and a conversion function like this…

func toSRGB(_ p3: float3) -> float4 {
    // Note: gamma decoding not strictly necessary in this demo
    // because 0 and 1 always decode to 0 and 1
    let linearSrgb = p3.gammaDecoded * linearP3ToLinearSRGBMatrix
    let srgb = linearSrgb.gammaEncoded
    return float4(x: srbg.x, y: srbg.y, z: srbg.z, w: 1.0)
}

…the color adjustment went like this:

let p3red = float3([1.0, 0.0, 0.0])
let p3green = float3([0.0, 1.0, 0.0])
let p3blue = float3([0.0, 0.0, 1.0])

let vertex1 = Vertex(position: leftCorner, color: toSRGB(p3red))
let vertex2 = Vertex(position: top, color: toSRGB(p3green))
let vertex3 = Vertex(position: rightCorner, color: toSRGB(p3blue))

let myWideColorVertices = [vertex1, vertex2, vertex3]

I hope this port helps someone out there. And huge thanks to David Gavilan for his informative blog posts and for his incredible helpful feedback on this post.

Hello Triangle Swift

Adventures in Wide Color: An iOS Exploration

I used to think the reddest red around was 0xFF0000. Not much more to say.

And then a few weeks ago, I watched one of Apple’s videos about working with Wide Color. It drove home the point that many visible colors simply can’t be rendered on certain devices, and, by implication, that there was a whole world of reds (and oranges and greens) that I just hadn’t been seeing on my iPhone 6s.

A few days later, I got my iPhone X — and suddenly I could capture these formerly hidden colors, and see them rendered up close, on a gorgeous OLED display.

It was like a veil had been lifted on my perception and appreciation of color.

To help me understand wide color better, I decided to write an experimental iOS app to identify these colors around me, in real time. The basic idea, inspired by this sample code from Apple, was this: Make an app that streams live images from the camera and, for each frame, highlights all the colors outside the standard range for legacy displays. Colors inside the standard range would be converted to grayscale; colors outside would be allowed to pass through unchanged. (Skip to the end for example screenshots.)

First, some background: Until the release of the iPhone 7, iPhone screens used the standard Red Green Blue (sRGB) color space, which is more than 20 years old. Starting with iPhone 7, iPhones began supporting the Display P3 color space, a superset of sRGB that can display more of the visual color spectrum.

How much more? Here’s a 3-D rendering of how they compare:

P3’s color gamut is about 25% larger than sRGB’s

As this makes clear, while P3 and sRGB converge near the “poles” of white and black, P3 extends much further near the “equator,” where the brightest colors lie. (To be clear, both spaces only cover a portion of all colors visible to the human eye.)

While the “reddest” corner of the sRGB gamut (the lower left of the inner cube) would be represented in sRGB by the color coordinates (r: 1.0, g: 0.0, b: 0.0) — where 1.0 represents the maximum value of the space’s red channel — the same point converted into P3 space would be (r: 0.9175, b: 0.2003, g: 0.1387).

Conversely, the corresponding corner of the outer P3 gamut, described in that space as (1.0, 0.0, 0.0), lies outside of sRGB and cannot be expressed in that color space at all.

But enough theory. Back to my project. Here’s a rough outline what I did:

  • Set up an AVCaptureSession that streams pixel buffers from the camera, in the P3 color space, if it’s supported.
  • Created a CIContext whose workingColorSpace is Apple’s extended sRGB color space. Using the extended sRGB format is crucial because “wide” color information will be both preserved and easily identifiable after converting from P3. Unlike sRGB, which clamps values to a range from 0.0 to 1.0 and thus discards any wide-color information, extended sRGB allows values outside of that range, which leaves open the possibility that wide-color-aware displays can use them.
  • Write a Metal fragment shader that allows wide colors to pass through unchanged, but converts “narrow” colors to a shade of gray.
  • Using the CIContext and a custom CIFilter, built with the Metal shader, take each pixel buffer in the stream, filter it and render it to the screen.

Step 1: Creating the AVCaptureSession

Apple’s AVCam sample project is an excellent template for how to capture images from the camera, and I was able to adapt it for my project with few changes.

In my case, though, I needed more than what the sample code’s AVCaptureVideoPreviewLayer could provide: I needed access to the video capture itself, so I could process each pixel buffer in real time. At the same time, though, I needed to make sure I was preserving wide-color information.

This added a small complication, which forced me to understand how an AVCaptureSession decides whether or not to capture wide color by default.

Left to its own devices (pun intended), an AVCaptureSession will try to do the “right thing” as relates to wide color, thanks to a tongue-twisting property introduced in iOS 10 called automaticallyConfiguresCaptureDeviceForWideColor. When set to true (the default), the session automatically sets the device’s active color space to P3 if a) the device supports wide color and b) the session configuration suggests that wide color makes sense.

But when, according to the default behavior, does wide color “makes sense”?

For starters, an AVCapturePhotoOutput must be attached to the AVCaptureSession. But if you also attach AVCaptureVideoDataOutput — as I did, because I wanted to capture a live stream — you need to be careful. Because Display P3 is not well-supported in video, the automatic configuration will revert to sRGB if it thinks the destination is a movie file.

The trick for staying in the P3 color space, in this case, is to make your non-movie intentions clear by doing this:

session.sessionPreset = .photo

With that done, I confirmed the capture of wide color by checking that, once session.commitConfiguration was called, device.activeColorSpace changed from sRGB to P3_D65.

Step 2: Creating the CIContext

It’s easy to lose wide-color information when rendering an image. As Mike Krieger of Instagram points out in this great blog post, iOS 10 introduced a piece of wide-color-aware API called UIGraphicsImageRenderer to help with the rendering of wide-color images in Core Graphics.

With Core Image, on the other hand, you need to make sure your CIContext’s working color space and pixel format are configured correctly.

Here’s the setup that worked for me: the working color space had to support extended sRGB, as you’d expect (to handle values below 0.0 or above 1.0), and the pixel format had to use floats (for similar reasons).

private lazy var ciContext: CIContext = {
 let space = CGColorSpace(name: CGColorSpace.extendedSRGB)
 let format = NSNumber(value: kCIFormatRGBAh) // full-float pixels
 var options = [String: Any]()
 options[kCIContextWorkingColorSpace] = space
 options[kCIContextWorkingFormat] = format
 return CIContext(options: options)
}()

Set up in this way, a CIContext can preserve extended sRGB data when it renders and image.

Step 3: Creating the CIFilter

The next step was building a filter to convert “non-wide” pixels to shades of gray. I decided an interesting way to do this would be to create a custom CIFilter that was backed by a Metal shader. The basic steps were:

  1. Write the Metal shader
  2. Create a CIKernel from the shader
  3. Create a CIFilter subclass to apply the CIKernel

Steps 2 & 3 are pretty well covered in this WWDC 2017 video. As for creating the shader, I was able to borrow some code from Apple’s very cool Color Gamut Showcase sample app.

It’s wonderfully simple: If the inbound color is greater than 1.0 or below 0.0, leave it alone. Otherwise, convert it to grayscale.

static bool isWideGamut(float value) {
    return value > 1.0 || value < 0.0;
}
namespace coreimage {
    float4 wide_color_kernel(sampler src) {
        float4 color = src.sample(src.coord());
        if (isWideGamut(color[0]) 
        || isWideGamut(color[1]) 
        || isWideGamut(color[2])) {
            return color;
        } else {
            float3 grayscale = float3(0.3, 0.59, 0.11);
            float luminance = dot(grayscale, color.rgb);
            return float4(float3(grayscale), 1.);
        }
    }
}

Step 4: Putting It Together

With that working, the last step was to grab each pixel buffer as it arrives, apply the filter, and then display it to the screen. This involved implementing a AVCaptureVideoDataOutputSampleBufferDelegate callback method, which I set up to be called on a dedicated, serial background queue.

After turning the CMSampleBuffer into a CIImage, I moved to a dedicated rendering queue and used my CIContext to render the CIImage to a CGImage, which then became a UIImage and was displayed on the screen, thanks to a plain old UIImageView.

Some disclaimers on this last part: I didn’t spend much time worrying about performance here, and it’s quite possible that on slow devices, the render queue could fail to keep up and become swamped with rendering tasks. In the real world, there would need to be a way to slow down the capture frame rate if the renderer couldn’t keep up.

Also, there are surely more efficient ways to display each CMSampleBuffer then creating a UIImage and assigning it to a UIImageView. For one thing, a more performant implementation would resize the image to the exact size of the display view during the rendering pass. (This sample Apple code turned each pixel buffer into an Open GLES texture, which frankly seemed like a lot of work for this little experiment.) I’m interested to hear how others would have approached this!

Up and Running

In any event, the experiment app ran very smoothly on my iPhone X: Core Image seemed more than capable of handling the 30 camera frames per second it was being asked to render. Meanwhile, I was surprised how much wide color I found in the world — even on a gray day in downtown Manhattan.

You can see a few examples of screenshots below.

And here’s a link to my WideColorViewer project.

(Cross posted from “Adventures in Wide Color: An iOS Exploration” on my Medium blog.)