Type-Safe Dynamic Mappings in Swift

We discovered a neat little pattern for statically validating config files, while still providing dynamic key-value access via Swift key paths.

The Problem

In our previous post, we evaluated many embedding models for sentiment analysis of product reviews. Under the hood, we developed a small but general embedding framework that accepts models in a variety of formats, and it can select any of the supported models based on a dynamic descriptor. For example, we support Model2Vec and Fastembed. A simplified set of descriptors might look like this in JSON:

{
    "mrlStaticMulti": {
        "backend": "fastembed",
        "path": "/path/to/MRLStaticMultilingual"
    },
    "potionRetrieval32M": {
        "backend": "model2vec",
        "path": "/path/to/PotionRetrieval32M"
    }
}

and Swift type for the descriptor of an individual model may be:

struct EmbeddingModelDescriptor {
    let backend: EmbeddingModelBackend
    let path: String
}

The question now is: how can we address these models dynamically, yet in a type-safe way?

First Attempt: The Simpleton

The "obvious" choice for dynamic key-value lookup is a dictionary, mapping model names (strings) to model descriptors:

let models = try JSONDecoder().decode(
    [String: EmbeddingModelCollection].self,
    from: jsonData,
)

However, this comes at a price: even though the set of models is well-known and completely static, and so are their names, there is no way for us to safely address them. Dictionary lookups can fail, for example, if you make a spelling error in the model name, or change the name in the JSON file but forget to do so in the code that refers to it. So this approach is less than ideal.

Second Attempt: The Rigorous

So what's the next best choice, one which can enforce that you never make a typo in the key name? You guessed it: a struct. Statically-typed structs have field names, and the compiler always validates accessing them. So our next attempt is a struct, which simply spells out the name of each model as a separate field, and each field has the same descriptor type:

struct EmbeddingModelCollection: Codable {
    let mrlStaticMulti: EmbeddingModelDescriptor
    let potionRetrieval32M: EmbeddingModelDescriptor
}

let models = try JSONDecoder().decode(
    EmbeddingModelCollection.self,
    from: jsonData,
)

Okay, that's better: now we are guaranteed that we only ever access a model that actually exists. and JSON decoding will also validate the field names against the keys of JSON objects. However, we now lost a different, useful property: the ability to address the models dynamically, based on runtime information.

So what can we do to have our cake and eat it too?

Final Solution: The Best of Both Worlds

The solution is actually pretty straightforward. We'll need to create a type that can translate itself into a KeyPath dynamically, which in turn we can use to look up any field in our struct via Swift's built-in subscripting support. (We can also use this to mutate fields if we declare all fields as var and use WritableKeyPath instead of a plain KeyPath.)

The good news is: we already have a candidate for such a type! It's just Self.CodingKeys. By conforming our CodingKeys enum to the set of relevant protocols, we can make it dynamically decodable from strings, iterate over its cases, and use it to get the appropriate key path, like this:

struct EmbeddingModelCollection: Codable {
    let mrlStaticMulti: EmbeddingModelDescriptor
    let potionRetrieval32M: EmbeddingModelDescriptor

    enum CodingKeys: String, CodingKey, Codable, CaseIterable {
        case mrlStaticMulti
        case potionRetrieval32M
    }

    // notice that this is infallible - it unconditionally
    // returns a descriptor, it is not an `Optional`
    subscript(codingKey: CodingKeys) -> EmbeddingModelDescriptor {
        self[keyPath: codingKey.keyPath]
    }
}

extension EmbeddingModelCollection.CodingKeys {
    var keyPath: KeyPath<
        EmbeddingModelCollection, EmbeddingModelDescriptor
    > {
        switch self {
            case .mrlStaticMulti:     \.mrlStaticMulti
            case .potionRetrieval32M: \.potionRetrieval32M
        }
    }
}

We can also add a couple more protocols than strictly necessary, e.g. Equatable and Sendable, in order to make CodingKeys easier to work with, so it feels more like the built-in String.

The Extra Bit: Total Recall

If you followed this post super carefully, you might have noticed that we introduced yet another subtle problem by using a struct instead of a dictionary: we are no longer guaranteed to have all of our JSON data represented in the top-level struct. If we have a superset of the struct fields in our JSON config file, then the extra keys will simply be ignored. How can we mitigate this issue, and ensure that all JSON keys are mapped to a field of our struct?

Unfortunately, neither does the automatically synthesized Decoding conformance ensure completeness of the data model, nor does JSONDecoder have any public API for enforcing such a property. Thus, we will have to write a bit of code manually. Fortunately, though, we do not have to (therefore we should not!) copy-paste the entire init(from: Decoder) function into every single type. We can devise a generic wrapper type instead, and use it for the bulk of the validation logic, then just defer to the compiler-synthesized implementation of Decodable.

The essence of this approach is included below. If you are curious to see a complete, working example, you can find it in this gist.

public protocol TotalCodingKey: Sendable, Equatable, Hashable,
    CaseIterable, CodingKey {}

public struct TotalDecodable<Collection, CodingKey> {
    var inner: Collection
}

extension TotalDecodable: Decodable
where
    Collection: Decodable,
    CodingKey: TotalCodingKey
{
    public init(from decoder: any Decoder) throws {
        let keyedContainer = try decoder.container(
            keyedBy: AnyCodingKey.self
        )

        let extraKeys = Set(keyedContainer.allKeys.map {
            $0.stringValue
        }).subtracting(
            CodingKey.allCases.map { $0.stringValue }
        )

        guard extraKeys.isEmpty else {
            // see the definition of ExtraKeysError in the full gist
            throw ExtraKeysError(extraKeys: extraKeys.sorted())
        }

        self.inner = try Collection.init(from: decoder)
    }
}

private struct AnyCodingKey: CodingKey, Equatable, Hashable {
    let stringValue: String
    let intValue: Int? = nil

    init?(intValue: Int) {
        return nil
    }

    init?(stringValue: String) {
        self.stringValue = stringValue
    }
}