Datatypes and schemas
As stated before, Smithy4s generates code that does not depends on any third-party library. However, we still want to use the generated code with specific serialisation technologies, such as JSON, or Protocol Buffers, or CBOR, MessagePack, XML (yes ... we know).
We also want to avoid having to implement complex macros to allow for auto-derivation of these things. For starters, the reality is that maintaining macros across two different Scala versions (2 and 3) is hard work. Secondly, macros close the door to an interesting feature, namely "dynamic schematisation" that we'll describe in another chapter.
If you have 45 minutes to waste, feel free to go watch the following video where Olivier explained the rationale behind the crazy pattern we are about to explain. Otherwise, head over below!
The Schema GADT
Each datatype generated by Smithy4s is accompanied by a schema
value in its companion object, which contains an expression of type smithy4s.schema.Schema
that captures everything needed to deconstruct/reconstruct instances of the datatype.
smithy4s.schema.Schema
is a Generalised Algebraic Datatype (or GADT for short) that can be used to precisely reference all the information needed to traverse datatypes that can be expressed in Smithy.
It is a bit like JVM reflection, except that it exposes higher-level information about the datatypes. It achieves this by exposing building blocks that accurately reflect what is possible to express in the Smithy language.
These building blocks form a metamodel: a model for models. And, unlike JVM reflection, using schemas is type-safe.
The Schema
type reflects the various ways of constructing datatypes in Smithy.
It is encoded as a sealed trait, the members of which capture the following aspects of the Smithy language:
- Primitives
- Lists
- Maps
- Enumerations
- Structures
- Unions
For a Scala type called Foo
, formulating a Schema[Foo]
is equivalent to exhaustively capturing the information needed for the serialisation and deserialisation of Foo
in any format (JSON, XML, ...). Indeed, for any Codec[_]
construct provided by third-party libraries, it is possible to write a generic def compile(schema: Schema[A]): Codec[A]
function that produces the Codec
for A
based on the information held by the Schema
.
Why do things this way? Why not just render Codec
during code generation?
The reason is that we want for the generated code to be completely decoupled from any serialisation format or library, and for the user to have the ability to wire that generated code in different ways, without having to change anything in the build.
Moreover, this approach has proven that it allows for a bounded investment for adding interop with various libraries, and offers really good testability.
Hints
In Smithy, all shapes (and members of composite shapes) can be annotated with traits. Smithy4s generically translates these annotations to instances of the corresponding generated classes, which means that Smithy4s supports generating user defined traits that it has zero knowledge of.
So if you have the following Smithy description:
namespace example
@trait
structure metadata {
@required
description: String
}
@metadata(description: "This is my own integer shape")
integer MyInt
When processing this Smithy model, Smithy4s renders a case class Metadata(description: String)
, with an associated ShapeTag[Metadata]
instance, and the following expression in the companion object of MyInt
:
val hints = Hints(
Metadata("this is my own integer shape")
)
The smithy4s.Hints
type is a polymorphic map that can hold shapes, keyed by ShapeTag
.
A ShapeTag
is a uniquely identified tag that uses referential equality.
Every schema can hold a Hints
instance, which means that in addition to the datatype structures, Schemas also offer an accurate reflection of the trait values that annotate shapes in the smithy models.
Smithy4s uses these hints to implement interpreters.
For instance, the smithy.api#jsonName
smithy trait translates to a smithy.api.JsonName
Scala type, that we can query from a Hints
instance when implementing a Schema ~> JsonCodec
transformation.
This allows to give users a little customisability in the json serialisation of their datatypes.
Structures
A structure, also referred to as product, or record, is a construct that groups several values together. Typically, it translates naturally to a case class.
namespace example
structure Foo {
@required
a: Integer
@length(min: 1)
b: String
}
...and the associated, generated Scala code:
package example
import smithy4s.schema.Schema._
case class Foo(a: Int, b: Option[String] = None)
object Foo extends smithy4s.ShapeTag.Companion[Foo] {
val id: smithy4s.ShapeId = smithy4s.ShapeId("example", "Foo")
implicit val schema: smithy4s.Schema[Foo] = struct(
int.required[Foo]("a", _.a),
string.optional[Foo]("b", _.b).addHints(smithy.api.Length(Some(1), None))
){
Foo.apply
}.withId(id)
}
As you can see, the Smithy structure translates quite naturally to a Scala case class
.
Every member of the structure that does not have either the @required
trait or a default value specified is rendered as an optional value defaulting to None (by default, smithy4s sorts the fields before rendering the case class so that the required ones appear before the optional ones. That is a pragmatic decision that tends to improve UX for users.)
Indeed, for each field, there is an associated reference to a schema (int, string, ...), a string label, and a lambda calling the case class accessor that allows the retrieval of the associated field value. Additionally, the constructor of the case class is also referenced in the Schema.
Typically, the accessors are needed for encoding the data, which involves destructuring it to access its individual components. The labels are there to cater to serialisation mechanisms like JSON or XML, where sub-components of a piece of data are labelled and nested under a larger block.
Conversely, the constructor is used for deserialisation, which involves reconstructing the data after all of its component values have been successfully deserialised.
Another detail is the presence of the addHints
call on field labelled with b
. This is due to the presence of the length
trait (from the smithy.api
namespace, aka the prelude) on the corresponding b
member of the smithy Foo
shape.
Note related to optional
and required
You may have noticed the required
and optional
methods, which create Field
instances from Schemas
, in order to pass them to structures.
Since 0.18, the concept of Option in Smithy4s is backed
by a OptionSchema
member of the Schema
GADT. Having Option as a first-class citizen has some advantages, as it allows to support sparse collections.
The downside is that this allows to create schemas (and therefore codecs) that do not abide by round-tripping properties. Indeed, once data is on the wire, it's often
impossible to distinguish Option[Option[Option[Int]] ]
from Option[Int]
. If you need to distinguish between presence of a null value and absence of a value, Smithy4s provides an additional Nullable type in order to allow an extra level of nesting.
Unions
Union, also referred to as coproduct, or sum type, is a construct that expresses sealed polymorphism. It is the dual of a structure: when structures express that you have A AND B, unions express that you can have A OR B.
The way this is expressed in Smithy looks like this:
namespace example
union Bar {
a: Integer
b: String
}
This hints at the default serialisation that AWS has intended to use on unions expressed in smithy, namely tagged unions.
Indeed, the AWS json-centric protocols specifies that shapes like these should be serialised in objects with a single key/value entry, where the key receives the value of the tag. For instance, { "a": 1 }
or { "b": "two" }
.
There are some very relevant technical reasons for it, but this way of encoding unions/co-products in JSON is arguably the best.
It may also be familiar to Circe users as it's the default encoding of co-products in circe-generic.
Regarding the Scala code rendered by Smithy4s for the above Smithy specification, it looks like this:
package example
import smithy4s.schema.Schema._
sealed trait Bar extends scala.Product with scala.Serializable
object Bar extends smithy4s.ShapeTag.Companion[Bar] {
val id: smithy4s.ShapeId = smithy4s.ShapeId("foobar", "Bar")
case class ACase(a: Int) extends Bar
case class BCase(b: String) extends Bar
object ACase {
val hints: smithy4s.Hints = smithy4s.Hints.empty
val schema: smithy4s.Schema[ACase] = bijection(int.addHints(hints), ACase(_), _.a)
val alt = schema.oneOf[Bar]("a")
}
object BCase {
val hints: smithy4s.Hints = smithy4s.Hints.empty
val schema: smithy4s.Schema[BCase] = bijection(string.addHints(hints), BCase(_), _.b)
val alt = schema.oneOf[Bar]("b")
}
implicit val schema: smithy4s.Schema[Bar] = union(
ACase.alt,
BCase.alt,
){
case _: ACase => 0
case _: BCase => 1
}.withId(id)
}
The union
is rendered as an ADT (sealed trait
), the members of which are single-value case classes wrapping values of the types referenced
by the union
member.
The Case
suffix is added as a way to reduce risk of collision between the generated code and other types (especially the types being wrapped).
Each ADT member is accompanied by its own schema, which is not provided implicitly, in an effort to retain coherence in the type-class instances,
and avoid the situation where you'd have different behaviours during serialisation based on whether you've up-casted a member to the ADT.
Additionally, the companion objects of each ADT members contain an alt
value (for "alternative"), which is the union's equivalent to the structure's field
.
Much like a field
, an alt
contains a label, and can carry hints. But unlike a field
, which contains an accessor, the alt
contains the function to "inject" (up-cast) the member into the union.
This is useful for de-serialisation, when, after successfully de-serialising a member of a union, you need to inject it into the ADT to return the expected type.
As for the union's schema, it is somewhat similar to the structure's, in that it references all its alternatives. But instead of a structure's constructor, we have a dispatch function instead, which contains a pattern match against all the possible members, and dispatches the "down-casted" value to its corresponding ordinal, allowing to recover the corresponding alternative. This is useful for serialisation, when the behaviour of the alternatives can only be applied to values of the corresponding type: "if my ADT is an A, then I serialise the A, and add a discriminating tag to the serialised A".
Named simple shapes
Smithy allows for the creation of named shapes that reference "primitive types":
namespace example
integer MyInt
Smithy4s translates this to a Scala newtype: a zero-overhead wrapper for the underling type (in this case, Int
):
package example
object MyInt extends Newtype[Int] {
val id: smithy4s.ShapeId = smithy4s.ShapeId("foobar", "MyInt")
val hints: smithy4s.Hints = smithy4s.Hints.empty
val underlyingSchema: smithy4s.Schema[Int] = int.withId(id).addHints(hints)
implicit val schema: smithy4s.Schema[MyInt] = bijection(underlyingSchema, MyInt(_), (_: MyInt).value)
}
A MyInt
type alias, pointing to the MyInt.Type
type member, is rendered in the example
package object, which makes it possible to write
such code:
val myInt: MyInt = MyInt(1)
// val int: Int = myInt // doesn't compile because MyInt is not an Int at compile time.
val int: Int = myInt.value
You may have noticed that the schema
value is using bijection
. Additionally to the GADT members stated previously, Schema
also has a BijectionSchema
member, which allows to apply bidirectional transformation on other Schemas.
This is useful for the case of newtypes: if we are able to derive a codec that can encode and decode Int
, it should be possible to derive a codec that encodes and decodes MyInt
.
Collections
Smithy supports two types of collections out of the box :
- list
- map
NB: the "set" type was supported in smithy 1.0, but has disappeared in smithy 2.0 in favour of the uniqueItems
trait
Additionally, Smithy4s allows users to annotate list shapes to customise the type of collection used during code-generation.
Smithy does not support generics, therefore all collection are named. Though seemingly tedious, it makes it easier to build tooling (and probably helps languages that do not support generics). Provided the following shape :
namespace example
list IntList {
member: Integer
}
You get the following Scala code :
package example
object IntList extends Newtype[List[Int]] {
val id: smithy4s.ShapeId = smithy4s.ShapeId("example", "IntList")
val hints: smithy4s.Hints = smithy4s.Hints.empty
val underlyingSchema: smithy4s.Schema[List[Int]] = list(int).withId(id).addHints(hints)
implicit val schema: smithy4s.Schema[IntList] = bijection(underlyingSchema, IntList(_), (_: IntList).value)
}
It is really similar to named primitives. However, for pragmatic reasons, when a structure references a collection in one of its members, the Scala field gets rendered using the de-aliased type (as opposed to the newtype). The IntList
newtype is generated mostly as a way to hold the hints and schemas
corresponding to the smithy IntList
shape. Additionally, the IntList
newtype is used by Smithy4s to render Hints
values :
namespace example
@trait
list info {
member: String
}
@info("foo", "bar", "baz")
structure A {}
would lead to the following code being rendered in the companion object of A
:
val hints: Hints = Hints(
example.Info(List("foo", "bar", "baz")),
)
This allows to query Hints for Info
using the following syntax: hints.get(example.Info)
Regarding the underlyingSchema
value in the companion object of IntList
, you can see that it is constructed using a list
function. Conceptually, it encodes this: "if I'm able to encode or decode an A
in a specific format, then I should be able to encode or decode a List[A]
".
Enumerations
Smithy allows for two types of enumerations : string and integer enumerations.
Additionally, smithy4s supports specifying whether an operation is open or closed. An open enumeration allows
for holding unknown values, whereas a closed one is strictly limited to a set of specified values. This brings
the total number of possible "flavours" of enumerations to 4, which is reified via a smithy4s.schema.EnumTag
ADT
that comprises 4 different cases : one for each combination between [open, closed]
and [int, string]
.
Enumerations are typically modelled as Algebraic Data types. Each case of an enumeration is associated with both a String and Int value. In the case of intEnum
, the string value is the name of the case. In the case of a normal (string) enum
, the integer value is the index of the case in the list.
Additionally, each enumeration case holds its own hints.
Closed enumerations
Given this smithy code :
namespace example
intEnum Numbers {
ONE = 1
TWO = 2
}
The corresponding generated Scala-code is :
sealed abstract class Numbers(_value: String, _name: String, _intValue: Int, _hints: Hints) extends Enumeration.Value {
override type EnumType = Numbers
override val value: String = _value
override val name: String = _name
override val intValue: Int = _intValue
override val hints: Hints = _hints
override def enumeration: Enumeration[EnumType] = Numbers
@inline final def widen: Numbers = this
}
object Numbers extends Enumeration[Numbers] with ShapeTag.Companion[Numbers] {
val id: ShapeId = ShapeId("smithy4s.example", "Numbers")
val hints: Hints = Hints.empty
case object ONE extends Numbers("ONE", "ONE", 1, Hints())
case object TWO extends Numbers("TWO", "TWO", 2, Hints())
val values: List[Numbers] = List(
ONE,
TWO,
)
val tag: EnumTag[Numbers] = EnumTag.ClosedIntEnum
implicit val schema: Schema[Numbers] = enumeration(tag, values).withId(id).addHints(hints)
}
Open enumeration
Given this smithy code :
namespace example
use alloy#openEnum
@openEnum
intEnum OpenNums {
ONE = 1
TWO = 2
}
The corresponding generated Scala-code is :
package smithy4s.example
import smithy4s.Enumeration
import smithy4s.Hints
import smithy4s.Schema
import smithy4s.ShapeId
import smithy4s.ShapeTag
import smithy4s.schema.EnumTag
import smithy4s.schema.Schema.enumeration
sealed abstract class OpenNums(_value: String, _name: String, _intValue: Int, _hints: Hints) extends Enumeration.Value {
override type EnumType = OpenNums
override val value: String = _value
override val name: String = _name
override val intValue: Int = _intValue
override val hints: Hints = _hints
override def enumeration: Enumeration[EnumType] = OpenNums
@inline final def widen: OpenNums = this
}
object OpenNums extends Enumeration[OpenNums] with ShapeTag.Companion[OpenNums] {
val id: ShapeId = ShapeId("smithy4s.example", "OpenNums")
val hints: Hints = Hints(
alloy.OpenEnum(),
)
case object ONE extends OpenNums("ONE", "ONE", 1, Hints())
case object TWO extends OpenNums("TWO", "TWO", 2, Hints())
final case class $Unknown(int: Int) extends OpenNums("$Unknown", "$Unknown", int, Hints.empty)
val $unknown: Int => OpenNums = $Unknown(_)
val values: List[OpenNums] = List(
ONE,
TWO,
)
val tag: EnumTag[OpenNums] = EnumTag.OpenIntEnum($unknown)
implicit val schema: Schema[OpenNums] = enumeration(tag, values).withId(id).addHints(hints)
}
As you can see, the main difference between the two is the presence of an final case class $Unknown
ADT member
in the open enumeration, which allows to capture values that are not defined in the specification.