Swift’s Codable and Stringly-Typed JSON Objects

So. Let’s say you’re in charge of making an iPhone app and a wearable device that work together to track your workouts and share them on social media.

And let’s say you expect to the app and the device to send, and receive, respectively, a fixed set of JSON commands with very different parameters in their payloads.

Each command will have special key that lets us know the command type we’re dealing with. (This key is commonly something like type.) But beyond that, the structure of these various command types will be quite unrelated.

For example, here’s a hypothetical “Start Workout” command, which, in addition to command_type, has three additional fields:

{
    "command_type": "start_workout",
    "location": "Gary's Gym",
    "date": "2020-03-16 19:45:13 +0000",
    "intensityLevel": 5
}

And here’s an “End Workout” command, which has no extra info:

{
    "command_type": "end_workout"
}

And here’s a “Share Workout” command, which has one additional field:

{
    "command_type": "share_workout",
    "service": "twitter"
}

The challenge here is that you don’t know the type of command to parse from the JSON until you’ve read a string from a previously agreed-upon key. (In this example, that key is command_type.) This string completely determines which other fields (if any) to expect–and, more broadly, what type of command you are dealing with.

It’s not such an uncommon scenario. You might also imagine, say, a push notification whose payload contains a key describing the event that triggered the push (e.g. "push_type": "account_updated") and several other key-value pairs that are totally specific to that push trigger.

How can we used Swift to simplify the task of encoding and decoding these “stringly-typed” JSON commands in a type-safe way?

Obviously, the Codable protocol is a handy choice here. Used with the JSONEncoder and JSONDecoder types, we’ll get a lot of the encoding and decoding implementation for free.

But in this case, because the object we’re trying to represent — let’s call it a Command — takes many heterogenous forms, there’s some additional complexity.

Of course, we could always just create a single type, conforming to Codable, that includes all of the properties of all of the command types. For example:

struct Command: Codable {
    let commandType: CommandType
    let location: String?
    let date: String?
    let intensityLevel: Int?
    let service: Service?
}

This doesn’t feel so great, though, if only because we’d be forced to make all of these properties Optional, since any given command type might only use a small subset.

If, instead, we made a totally separate type, conforming to Codable, for every command type, this solves the problem of unused properties. But in this arrangement, we’d need to look into each JSON object in advance, inspecting the command_type key, before deciding which of these unrelated types to pass into JSONDecoder.decode(_:from:).

Alternatively, we could make several classes that descend from a common Codable ancestor — and I’ve seen some good implementations of this inheritance-based setup, including one here. This makes a lot of sense if the various types share certain properties in common.

With that approach, there is one disadvantage: we wouldn’t be able to exhaustively switch through the resulting subclasses, which means we might forget to handle new command types as they are added. (Unlike in Kotlin, Swift doesn’t have a concept of “sealed classes,” and so the compiler can’t check to make sure we’ve exhaustively handled every possible subclass.)

For this exercise, we’d really like command parsing to look like this:

 
    do {
        let command = try JSONDecoder().decode(Command.self, from: data)
        switch command {
            case .startWorkout(let workout):
                print("Starting workout at \(workout.location)")
            case .endWorkout:
                print("Ending workout")
            case .shareWorkout(let service):
                print("Sharing workout to \(service)")
        }
    } catch {
        // Handle the error
    }

In this approach, we’d like to make a single call to JSONDecoder.decode(_:from:), and then switch on all of the possible cases to extract the specific, fully-typed payload for each case. (There’s no need for a default branch here; if the command type is unrecognized, we can handle that in the catch block.)

We can make this possible by declaring Command to be an enum whose cases have associated values, each of which (if it exists) conforms to Codable.

So the overarching type becomes something like this:

enum Command {
    case startWorkout(Workout)
    case endWorkout
    case shareWorkout(to: ShareService)
}

With the associated value types looking like this:

struct Workout: Codable {
        let location: String
        let date: String
        let intensityLevel: Int
    }

    struct ShareService: Codable {
        enum Service: String, Codable {
            case facebook, instagram, twitter
        }

        let service: Service
    }

Now, we just need to write Command.encode() and Command.decode(from:) to make this happen.

Let’s start with the decoding.

First off, we’ll create a single new type conforming to CodingKey — called CommandKeys — that specifies the all-important key used to determine which kind of command we are parsing.

The second type we’ll create is CommandType, which specifies all the allowable values this key can have.

extension Command {
    enum CommandKeys: String, CodingKey {
        case commandType = "command_type"
    }

    enum CommandType: String, Codable {
        case start = "start_workout"
        case end = "end_workout"
        case share = "share_workout"
    }
}

With that preparation, all we need to do is implement init(from:), which does the actual parsing. Here’s the whole thing:

extension Command: Decodable {
    enum CommandKeys: String, CodingKey {
        case commandType = "command_type"
    }

    enum CommandType: String, Codable {
        case start = "start_workout"
        case end = "end_workout"
        case share = "share_workout"
    }

    init(from decoder: Decoder) throws {
        let values = try decoder.container(keyedBy: CommandTypeKeys.self)
        let commandType = try values.decode(CommandType.self, 
                                            forKey: .commandType)
        switch commandType {
        case .start: 
            self = .startWorkout(try Workout(from: decoder))
        case .end:
            self = .endWorkout
        case .share: 
            self = .shareWorkout(to: try ShareService(from: decoder))
        }
    }
}

The first two lines are standard for any custom override of Decodable.init(from:): Get a keyed container, and start decoding values for keys — in this case, the commandType key.

At that point, we’re almost done. We just switch over the resulting enum and decode the object we need as an associated value. For example, the associated value type for the start command is a Workout — which itself is fully decodable, so we just need to call Workout(from: decoder).

Encoding is equally easy. We start be encoding the all-important commandType key, and finish by encoding the entire associated value (if there is one).

extension Command: Encodable {
    func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CommandKeys.self)
        switch self {
        case .startWorkout(let workoutInfo):
            try container.encode(CommandType.start, forKey: .commandType)
            try workoutInfo.encode(to: encoder)
        case .endWorkout:
            try container.encode(CommandType.end, forKey: .commandType)
        case .shareWorkout(let shareInfo):
            try container.encode(CommandType.share, forKey: .commandType)
            try shareInfo.encode(to: encoder)
        }
    }
}

(Note: After writing this up, I found this blog post that beautifully explains this same concept of coding heterogeneous JSON. The author’s example assumes each object type’s properties are gathered under an attributes property — this example shows what you might do if these properties were instead at the top level.)

Hello Triangle, Meet Swift! (And Wide Color)

Two triangles rendered with Metal

The colors at left were gamma encoded after interpolation; those on the right were not.

For an iOS developer wanting to get their feet wet with Metal, a natural place to start is Apple’s Hello Triangle demo.

It is truly the “Hello World” of Metal. All it does is render a two-dimensional triangle, whose corners are red, green and blue, into an MTKView. The vertex and fragment shaders are about as simple as you can get. Even so, it’s a great way to start figuring out how the pieces of the pipeline fit together.

The only thing is—it’s written in Objective C.

As a Swift developer, I found myself wishing I could see a version of Hello Triangle in that language. So I decided to convert it to Swift. (The conversion itself was pretty straightforward: You can see the code in this repo.)

To spice things up a little, I also updated the demo to support wide color, which in Apple’s ecosystem means using the Display P3 color space. (Wide color refers to the ability to display colors outside of the traditional gamut, known as sRGB; it’s something I explored in this earlier post.)

Supporting wide color in Hello Triangle is conceptually simple: Instead of setting the vertices to pure red, green and blue as defined in sRGB, set them to the pure red, green and blue as defined in Display P3. On devices that support it, the corners of the triangle will appear brighter and more vivid.

But as a Metal novice, I found it a bit tricky. In MacOS, the MTKView class has a settable colorspace property, which presumably makes things fairly simple—but in iOS, that property isn’t available.

For that reason, it wasn’t immediately clear to me where in the Metal pipeline to make adjustments for wide color support.

I found an answer in this excellent Stack Overflow reply and related blog post. The author explains how to convert Display P3 color values (which range from 0.0 to 1.0, but actually refer to a wider-than-normal color space) to extended sRGB values (which is comparable to normal sRGB except the values can be negative or greater than 1.0) with the help of a matrix transform. The exact math depends on the colorPixelFormat of the MTKView, which determines where the gamma gets applied.

OK, so about gamma: the gist of gamma correction is that color intensities are often passed through a non-linear function before saving an image. Because most images have only 256 luminance levels, and the human eye is very sensitive to changes in dark colors, the gamma function helps store more darks, sacrificing bright intensities. The values are then passed through an inverse function when presenting on a display.

Because gamma encoding is not linear, values that are evenly spaced before encoding (also known as “compression”) won’t be evenly spaced after the encoding. (This blog post has a superb explanation of gamma correction for those who aren’t familiar.)

There’s a lot of implicit gamma encoding and decoding that can happen in the Metal pipeline, and if you manipulate values without knowing which state you’re in, things can get screwed up fast.

As I learned from those earlier blog posts, there are a couple of options for handling gamma when rendering in wide color to a MTKView:

  1. convert your Display P3 color values to their linear (non-encoded) counterpart in sRGB, and allow the MTKView to apply the gamma encoding for you (by choosing pixel format .bgra10_xr_srgb), or
  2. convert the P3 values to linear sRGB and then pre-apply the gamma encoding yourself mathematically, choosing the pixel format .bgra10_xr.

In this demo, this is the difference between converting the left corner’s “extended” color to 1.2249, -0.04203, -0.0196 (which is P3’s reddest red, converted to linear sRGB) and converting it to 1.0930, -0.2267, -0.1501 (P3’s reddest red as sRGB with gamma encoding applied; these are the numbers you would get if you used Apple’s ColorSync utility to convert to sRGB).

While these conversions are probably best done in a shader, I only had three vertices to handle, so I did it in Swift code using matrix math (see below).

After trying options 1 and 2 above, I noticed an interesting difference in the visual results: when I let the MTKView apply gamma compression to my vertex colors (option 1, pictured above at left), the interior of the triangle was much lighter than when I used the technique in option 2 (right).

The issue was this: In option 1, not only were my triangle’s corners being assigned gamma-compressed values, but so were all of the pixels in between.

The way GPUs work is that values in between the defined vertices are computed automatically using a linear interpolation (or, strictly speaking, a barycentric interpolation) before being passed to the fragment shader.

After the interpolation (which occured in linear space), the gamma encoding moved all of the pixels toward lighter intensities (higher numbers, closer to 1.0).

But when I applied gamma encoding to the converted vertex colors “by hand” (option 2) and set the MTKView to the colorPixelFormat of .bgra10_xr, only the corners were gamma encoded, and the interpolation was effectively done in gamma space. The result was a triangle whose corners were the same color as in option 1, but whose interior values were biased toward the dark end, because of the nature of the gamma function described above.

While neither option is necessarily wrong, you might argue that option 1 (interpolating in linear space) seems more natural, because light is additive in linear space.

Some specifics below:

Using this matrix and conversion functions from endavid

private static let linearP3ToLinearSRGBMatrix: matrix_float3x3 = {
    let col1 = float3([1.2249,  -0.2247,  0])
    let col2 = float3([-0.0420,   1.0419,  0])
    let col3 = float3([-0.0197,  -0.0786,  1.0979])
    return matrix_float3x3([col1, col2, col3])
}()

extension float3 {
    var gammaDecoded: float3 {
        let f = {(c: Float) -> Float in
            if abs(c) <= 0.04045 { return c / 12.92 } return sign(c) * powf((abs(c) + 0.055) / 1.055, 2.4) } return float3(f(x), f(y), f(z)) } var gammaEncoded: float3 { let f = {(c: Float) -> Float in
            if abs(c) <= 0.0031308 {
                return c * 12.92
            }
            return sign(c) * (powf(abs(c), 1/2.4) * 1.055 - 0.055)
        }
        return float3 (f(x), f(y), f(z))
     }
}

…and a conversion function like this…

func toSRGB(_ p3: float3) -> float4 {
    // Note: gamma decoding not strictly necessary in this demo
    // because 0 and 1 always decode to 0 and 1
    let linearSrgb = p3.gammaDecoded * linearP3ToLinearSRGBMatrix
    let srgb = linearSrgb.gammaEncoded
    return float4(x: srbg.x, y: srbg.y, z: srbg.z, w: 1.0)
}

…the color adjustment went like this:

let p3red = float3([1.0, 0.0, 0.0])
let p3green = float3([0.0, 1.0, 0.0])
let p3blue = float3([0.0, 0.0, 1.0])

let vertex1 = Vertex(position: leftCorner, color: toSRGB(p3red))
let vertex2 = Vertex(position: top, color: toSRGB(p3green))
let vertex3 = Vertex(position: rightCorner, color: toSRGB(p3blue))

let myWideColorVertices = [vertex1, vertex2, vertex3]

I hope this port helps someone out there. And huge thanks to David Gavilan for his informative blog posts and for his incredible helpful feedback on this post.

Hello Triangle Swift