Services and endpoints

In addition to relying heavily on a Schema construct, which enables abstracting over serialisation, Smithy4s uses abstractions to codify the notion of interface, to allow for interoperability with various communication protocols. The idea is to reason generically about things of this shape :

trait Interface[Context[_]]{
  def operation1(a: A, b: B): Context[Output1]
  def operation2(c: C, d: D, e: E): Context[Output2]
}

This generalisation enables the easy interpretation of implementations of such interfaces into services (HTTP, RPC, etc), or conversely, the derivation of stub instances of these interfaces to talk to remote services.

The creation of an abstraction that allows for this generalisation is a problem similar to the one that lead to the Schema construct: one needs to deconstruct the notion of "interface" into fundamental building blocks.

The duality of final and initial algebras

Before we dive into the core of the solution, one notion that is drastically helpful is the duality between finally-encoded algebras and initially-encoded algebras.

Finally-encoded algebras are object-oriented encodings of a set of operations, just like above: operations are represented as methods in an interface. Interpretation of expressions written in terms of these methods does not involve any runtime transformation from one context to another: the method call is merely executed. In other words, when they are executed, expressions coming from finally-encoded algebras are already in their "final form".
Conversely, initially-encoded algebras represent expressions as data, implying that interpretation involves a transformation of this data into lower level method calls. However, data has the quality of being a first class construct in programming languages, meaning you can pass it around and use it as parameter to functions. This allows for the unification of code-paths, as the differences between some aspects of a bit of logic can be absorbed by the data and handled later on.

Finally-encoded KVStore algebra :

trait KVStore[Context[_]]{
  def put(key: String, value: String): Context[Unit]
  def get(key: String)                : Context[Option[String]]
  def delete(key: String)             : Context[Unit]
}

Initially-encoded KVStore algebra :

sealed trait KVStoreOp[Output]
object KVStoreOp {
  case class Put(key: String, value: String)  extends KVStoreOp[Unit]
  case class Get(key: String)                 extends KVStoreOp[Option[String]]
  case class Delete(key: String)              extends KVStoreOp[Unit]
}

These two encodings contain a similar amount of information. It is nearly-trivial to go from a KVstore[Context] instance to a KVStoreOp ~> Context polymorphic function (natural-transformation), and vice versa:

trait ~>[F[_], G[_]]{
  def apply[A](fa: F[A]): G[A]
}

def asNaturalTransformation[Context[_]](impl: KVStore[Context]) = new (KVStoreOp ~> Context){
  def apply[A](fa: KVStoreOp[A]): Context[A] = fa match {
    case KVStoreOp.Put(key, value) => impl.put(key, value)
    case KVStoreOp.Get(key)        => impl.get(key)
    case KVStoreOp.Delete(key)     => impl.delete(key)
  }
}

def fromNaturalTransformation[Context[_]](run: KVStoreOp ~> Context) = new KVStore[Context]{
  def put(key: String, value: String) = run(KVStoreOp.Put(key, value))
  def get(key: String)                = run(KVStoreOp.Get(key))
  def delete(key: String)             = run(KVStoreOp.Delete(key))
}

This duality is heavily used by Smithy4s: finally-encoded interfaces are generally more natural to Scala developers, and are better supported in editors (autocompletion, etc). But from an implementation's perspective, the initial, data-based encoding is really interesting, because operations are reified as data-types that can be associated with instances of generic type-classes: it is possible to abstract over data, it is not possible to abstract over method calls.

A detour around kinds

The methods generated by Smithy4s are conceptually similar to the methods expressed in the example above, except that the output types are significantly more verbose.

trait Interface[Context[_, _, _, _, _,]]{
  def operation1(a: A, b: B): Context[Input, Error, Output, StreamedInput, StreamedOutput]
}

Let's address this awkwardness right away, by explaining the rationale behind this seemingly humongous signature :

Input

It's the input type of an operation. Typically, a case class that holds fields matching the method parameters. We keep track of it in the return type for several reasons:

In the internal logic of Smithy4s, It prevents having to prematurely shoe-horn kinds into other kinds by means of injection/projection, which helps both implementor and compiler alike
It will come in handy for the implementation of some pagination-aware interpreters, as pagination typically works by performing a modification of the previous input in order to get the next batch (page) of results. This implies that the input (and therefore its type) must be tracked across several requests resulting from a single method call.

Error

The execution of an operation can result in errors. The Smithy language allows for tying a list of errors to operations. When generating the associated code, Smithy4s synthesize a union. This allows the coproduct of errors associated to an operation to be represented as a bona fide Scala type, which we can abstract over via some type-class instance. This is also very useful for the writing of bi-functor interpreters, for users that are interested in this kind of UX.

Output

No surprise there: this is the data resulting from the run of the operation.

StreamedInput, StreamedOutput

Smithy supports the concept of Streaming. It is communicated as a trait that annotates a single field of the input shape or/and output shape of an operation. Scala does not have a "standard" way of expressing streaming semantics. Moreover, streaming constructs in Scala are heavily context dependant. It is therefore impossible for us to incorporate the concept of "streaming" to our Schema construct as it is meant to be context-free and third-party-free.

To get some intuition for why that is: say we want to express streaming using fs2. If we naively generate a case class that has one of its fields annotated with @streaming, it means that the the field is of type fs2.Stream[F, A], which means that we either need to make a decision on what the F is, which is not okay for obvious reasons, or we need to propagate the F[_] type parameter upward to the case class. Now our Schema value, which accompanies the case-class, also have to carry the F ... this propagates throughout the whole codebase. We deemed that not acceptable.

Rather than polluting all layers of abstraction, we decided to just have the concept of operation be impacted and hold the streamed type in a separate type parameter. This allows for interpreters from various ecosystem to emerge. It also has the quality of allowing users to access the unary component of outputs (ie, data that is communicated in the headers of HTTP responses) without necessarily allocating resources to consume the streamed component of the output.

NB: at the time of writing this, Smithy4s does not have any streaming-aware interpreter implemented. But streaming is such a fundamental notion in remote interactions, and we had to devise a plan to ensure that third parties could decide to implement interpreters without waiting.

Transformation

Because of the complex kinds we're dealing with, we codify a polymorphic function (natural-transformation), called smithy4s.kinds.PolyFunction5 that allows us to work at this level :

trait PolyFunction5[F[_, _, _, _, _], G[_, _, _, _, _]] {
  def apply[I, E, O, SI, SO](fa: F[I, E, O, SI, SO]): G[I, E, O, SI, SO]
}

This is a mouthful, but conceptually, it's exactly the same as our good old polymorphic function typically aliased to ~>.

Codifying the duality between initial and final algebras

What we want users to manipulate is the final-encoded version of a service: a good-old object-oriented interface that has decent editor support. But we need the initial-encoded version to implement interpreters in a generic fashion.

So we codify the duality to allow for switching from one to the other via an abstraction called Smithy4s.Service, which is the entry point to all interpreters.

trait Service[Final[_[_, _, _, _, _]]] {
  type Operation[_, _, _, _, _]
  def toPolyFunction[F[_, _, _, _, _,]](alg: Final[F]): PolyFunction5[Operation, F]
  def fromPolyFunction[F[_, _, _, _, _]](polyFunction: PolyFunction5[Operation, F]): Final[F]

  // ...
}

Implementations of such interfaces are typically code-generated. This implies that any smithy Service shape gets translated as a finally-encoded interface, but also as an initially-encoded GADT

The high-level philosophy of Smithy4s

The goal of Smithy4s is to allow users to derive client stubs and routers in various protocols, by running the generated code (or instances of generated interfaces) in some one-liner functions. To that end, Smithy4s surfaces a number of abstractions (such as smithy4s.schema.Schema) that allow for the implementation of (very) polymorphic interpreters. These interpreters operate on the generated code, which reflects what the user defines in their smithy Specs.

The abstractions used by interpreters contain all the elements that allow for turning a high-level method call (from an interface generated by Smithy4s) into a low level request of some sort, and then transform a low level response into the output of the method call.

Logical flow: client-side

Conceptually, to derive a high-level client that uses some sort of Request => Response protocol, the implementation has to follow a sequence of steps:

Assuming this method call: kvstore.get("key")
turning the method call into a piece of data: KVStoreOp.Get("key") using the initially-encoded dual of the KVStore interface
Retrieving the Smithy4s Schemas (input and output) associated to the Get operation
Compiling the schema associated to the input of the Get operation into some encoding function: GetInput => Request
Running the request through a low-level Request => Response function (like an HTTP client)
Running Get into some function that gives us its GetInput representation
Compiling the schema associated to the output (GetOutput ~= Option[String]) of the Get operation into some decoding function Response => Output

So we get kvstore.get => KVStoreOp.Get => GetInput => Request => Response => GetOutput, which gives us the full data flow, client side.

Logical flow: server-side

The server side is different in that we want to derive the Request => Response function from an instance of our interface (KVStore). The goal is to mechanically translate a request into a method call, and a method's output into a response. The sequence:

From a given Request, find the corresponding operation Op (for instance, by means of HTTP path). Let's assume it's the get operation,
Retrieve the Smithy4s Schemas (input and output) associated to the operation (KVStoreOp.Get)
Compile a Request => GetInput decoding function, and run the Request through it
From GetInput, recreate the KVStoreOp.Get instance
From KVStoreOp.Get, use the final-encoded dual of KVStoreOp to call the KVStore#get method (implemented by the user). This gets us an GetOutput
Compile a GetOutput => Response encoding function from the schemas, and run the output through it

So we get Request => KVStoreOp.GetInput => KVStoreOp.Get => kvstore.get => GetOutput => Response, which gives us the full data flow, service side.

Both the service-side and client-side logical flows guide the design of the abstractions that are exposed by Smithy4s.

A note about efficiency

The flows described above are merely conceptual, and do not account for the optimisations involved to ensure that schemas are not recompiled into codecs on a per-request basis (which would greatly impact performance). Interpreters provided by Smithy4s (HTTP and co) are written to ensure that all compilation is performed ahead of receiving requests, by means of preliminary computations and caching.

The Endpoint abstraction

The smithy4s.Endpoint abstraction ties a specific operation to the various schemas that are tied to it.

trait Endpoint[Op[_, _, _, _, _], I, E, O, SI, SO] {
  def schema: OperationSchema[I, E, O, SI, SO]
  def wrap(input: I): Op[I, E, O, SI, SO]
}

where smithy4s.schema.OperationSchema is a product of all schemas involved in an specific operation.

final case class OperationSchema[I, E, O, SI, SO](
    id: ShapeId,
    hints: Hints,
    input: Schema[I],
    error: Option[ErrorSchema[E]],
    output: Schema[O],
    streamedInput: Option[StreamingSchema[SI]],
    streamedOutput: Option[StreamingSchema[SO]]
) {

Endpoints are not type-classes. Instead, an Endpoint instance is provided by the companion object of each member of the GADT forming the initial-encoding of the service interface. So, going back to our KVStore, the corresponding sealed-trait would look like this :

sealed trait KVStoreOp[Input, Error, Output, StreamedInput, StreamedOutput]

and the put operation would look like :

case class Put(input: PutRequest) extends KVStoreOp[PutRequest, PutError, PutResult, Nothing, Nothing]
object Put extends Endpoint[KVStoreOp, PutRequest, PutError, PutResult, Nothing, Nothing] {
  val input = PutRequest.input
  val output = PutRequest.schema
  val streamedInput = SteamingSchema.nothing
  val streamedOutput = StreamingSchema.nothing
  val errorable: Option[Errorable[PutResult]] = this
  // ...
  val schema: OperationSchema[PutRequest, PutError, PutResult, Nothing, Nothing] =
    Schema.operation(ShapeId("namespace", "Put"))
      .withInput(PutRequest.schema)
      .withError(PutError.errorSchema)
      .withOutput(PutResult.schema)
    def wrap(input: PutRequest) = Put(input)
}

A note on errors

As stated previously, Smithy4s generates a coproduct type for each operation, where the members of the coproduct point to the various errors listed in the smithy operation shape. Additionally, each structure annotated with @error in smithy is rendered as a case-class that extends Throwable, because Throwables are the de-facto standard of doing error handling on the JVM. Even libraries that use Either to perform error handling often represent the left-hand-side of the Either as some throwable type, to facilitate the absorption of errors into the error-channels of monadic constructs (IO.raiseError, etc)

As a result, it is important for Smithy4s to expose functions that generically enable the filtering of throwables against the Error type parameter of an operation, so that interpreters can intercept errors and apply the correct encoding (dictated via Schema) before communicating them back to the caller over the wire. Conversely, it is important to expose a function that allows to go from the generic Error type parameter to Throwable, so that errors received via low-level communication channels can be turned into Throwable at the client call site, in order to populate the relevant error channel when exposing mono-functor semantics.

Therefore, when a smithy operation has errors defined, the corresponding smithy4s.schema.OperationSchema references a smithy4s.schema.ErrorSchema, which looks like this :

case class ErrorSchema[E] private[smithy4s] (
    schema: Schema[E],
    liftError: Throwable => Option[E],
    unliftError: E => Throwable
)