OCaml GADT Encodings: Subtyping & Variance

Updated 7 February 2026

OCaml GADT encodings are systematic approaches to represent advanced type relations, such as subtyping and indexed type equalities, with zero runtime overhead.
They leverage OCaml's features like higher-rank polymorphism, higher-kinded types, and modular implicits to enable type-safe coercions and variance proofs.
These encodings support generic programming patterns and practical applications including covariant wrappers, heterogeneous object arrays, and bounded quantification.

OCaml GADT encodings are systematic approaches for internalizing advanced type relations, such as subtyping and indexed type equalities, using Generalized Algebraic Data Types (GADTs) in OCaml. These encodings exploit OCaml's type system—including higher-rank polymorphism, higher-kinded types (via first-class modules), and, optionally, the modular implicits extension—to represent type-level information and relationships as first-class values. The core motivation is to allow user-defined libraries to encode type-indexed propositions and relations (notably subtyping) while enabling type-safe coercions, variance proofs, and generic programming patterns with minimal runtime overhead.

1. Logical Foundations of GADT-Based Subtyping Encodings

Formalizations of subtyping in OCaml GADT encodings derive from Liskov and Wing's behavioral subtyping: $S <: T$ iff every property provable for all $x:T$ is also provable for all $y:S$ . In type-theoretic terms, this principle is expressed as $\forall\phi. \phi(T) \rightarrow \phi(S)$ , relating to the contravariant (consumer) context $\phi$ .

Under the Curry–Howard correspondence, such logical relations are encoded as witness types, typically parameterized as $( -'a, +'b )\,\mathtt{sub}$ , denoting evidence of $a \leq b$ . This is operationalized through module types for positive (covariant) and negative (contravariant) contexts:

1 2	module type POS = sig type +'a t end (* positive contexts, covariant ) module type NEG = sig type -'a t end ( negative contexts, contravariant *)

The minimal logical interface for subtyping, typically named SUB, provides:

A witness type $( -'a, +'b )\,t$ for $a \leq b$ ,
Reflexivity: $\mathtt{refl}: (a, a)~t$ ,
Lifting: $\mathtt{lift}: \{P:\mathtt{POS}\} \rightarrow (a, b)~t \rightarrow (a~P.t, b~P.t)~t$ ,
A GADT-style coercion operator: $( :> ): a \rightarrow (a, b)~t \rightarrow b$ .

All concrete encodings maintain zero-cost coercion: all runtime conversions are just the identity function (Yallop et al., 2019).

2. Three Interdefinable Encodings of Subtyping Witnesses

Three principal OCaml GADT encodings instantiate the SUB interface for first-class subtyping. Each is fully interconvertible via a canonical conversion function. They are:

Negative-Context Encoding

Represents $(a, b)~t$ as a higher-rank function that, for each negative context $N$ (a consumer), turns a $b$ -consumer into an $a$ -consumer:

1	type (-'a, +'b) t = { N:NEG } -> ('b N.t -> 'a N.t)

Reflexivity and lifting are implemented as:

$\mathtt{refl}\,{N:NEG}~x = x$
$\mathtt{lift}\,{P:POS}~s~{Q:NEG}~x = s~\text{with}~N=\text{COMPOSE}(Q, P)~x$

Coercion specializes to $N.t = 'a \rightarrow b$ , i.e., function types, with the identity passed in.

Positive-Context Encoding

Models $(a, b)~t$ as a function that, for each positive context $P$ (a producer), turns an $a$ -producer into a $b$ -producer:

1	type (-'a, +'b) t = { P:POS } -> ('a P.t -> 'b P.t)

Coercion uses the trivial identity context.

Church (Initial Algebra) Encoding

Encodes $(a, b)~t$ as a Church-style rank-2 type: for every implementation $S$ of SUB, provides an $S.t$ witness:

1	type (-'a, +'b) t = { S:SUB } -> ('a,'b) S.t

All three modules satisfy the same interface, and conversion between them is accomplished via the general function:

1	val conv : {A:SUB} -> {B:SUB} -> ('a,'b) A.t -> ('a,'b) B.t

by passing a positive context that witnesses B's subtyping and then coercing via A (Yallop et al., 2019).

3. Applications: Variance, Object Subtyping, and Quantification

First-class subtyping encodings offer immediate applications in several domains:

Covariant Wrappers: Invariant types (e.g., Lzy.t) can be wrapped to provide covariance, yielding new types (CovLzy.t) indexed by a subtyping witness and enabling safe coercion at force time.
Object Arrays with Row Subtyping: By encoding arrays as existential wrappers paired with subtyping witnesses, one can, e.g., process heterogeneous arrays of objects sharing common methods using subtyping.
Selective Abstraction (Private Types): Abstract types can carry public subtyping witnesses (e.g., t_sub_int : (t, int) sub), allowing one-way coercion in an API without inversion.
Bounded Quantification: Universal bounds over subtypes (e.g., $\forall\alpha \leq t.~\alpha \rightarrow t$ ) can be directly encoded as records parameterized over a subtyping witness.
Variance Proofs: Proofs of variance (covariance, contravariance) are witnessed by functions such as $\mathtt{list\_cov} : (a, b)\,\mathtt{sub} \rightarrow (a\,\mathtt{list}, b\,\mathtt{list})\,\mathtt{sub}$ (Yallop et al., 2019).

4. Role of Modular Implicits and Higher-Kinded Quantification

All three encodings crucially rely on higher-rank, higher-kinded quantification (over POS, NEG, and SUB modules). The modular implicits extension to OCaml automates the resolution of these module parameters, greatly reducing verbosity and ensuring that the variance annotations propagate through the code as enforced by OCaml's type checker.

Without modular implicits, users are forced to manually thread functor or module arguments at every use site of lift or (:>), which is verbose and error-prone. Modular implicits enable seamless type inference of module context, making first-class subtyping libraries practical for user code (Yallop et al., 2019).

5. Expressivity, Zero-Cost Guarantees, and Limitations

All three subtyping encodings are expressively equivalent; they can each encode the others by conversion. All implementations offer zero runtime overhead; coercions and lifted subtypes are always the identity function—no data movement or runtime checks occur.

The trade-offs are:

Verbosity and Mechanistic Details: Negative-context encodings involve more explicit context composition; positive encodings are more concise when only covariance is required; Church encodings present a smaller kernel but can be less modular.
Module Machinery: All approaches require first-class modules and higher-kinded type parameters, available in OCaml (with extensions). Without such features, less direct encodings are necessary.
Lack of Inversion: Unlike built-in GADTs, these encodings do not provide pattern-matchable, invertible proof objects. For instance, it's not generally possible to invert a ('a list, 'b list) sub into a corresponding ('a, 'b) sub (Yallop et al., 2019).

A summary is presented in the table below:

Encoding	Mechanism Involved	Cost	Invertibility	API Ergonomics
Negative-context	Contravariant funs	Zero	No	Verbose for complex compositions
Positive-context	Covariant funs	Zero	No	Concise with covariance
Church	Initial algebra	Zero	No	Smallest kernel, less modular

6. GADT Encodings: Church vs. Fixpoint in OCaml

Orthogonally to subtyping, GADTs in OCaml can be encoded either via Church encodings or as (higher-order) functorial fixpoints. The Church encoding exposes existential type indices in constructors without requiring a uniform map operation. For example, the standard GADT syntax:

type _ expr =
  | Int : int -> int expr
  | Bool : bool -> bool expr
  | Add : int expr * int expr -> int expr
  | Eq  : 'a expr * 'a expr -> bool expr

This encoding precludes a well-typed universal map : ('a -> 'b) -> 'a expr -> 'b expr, as in specific cases (e.g., Eq), the index changes non-uniformly, violating functor laws (Johann et al., 2021).

In contrast, the fixpoint encoding defines GADTs as $\mu$ of higher-order functors, explicitly equipping the body functor with a lawful map operation. This enables traversals and naturality properties, at the cost of giving up the full relational parametricity of the Church approach:

Church encoding: Preserves full parametricity; no universal map
Fixpoint encoding: Allows naturality- and fusion-style theorems via map, but loses non-uniform inhabitance/freeness properties

This trade-off is formalized: no model of GADTs can simultaneously provide all of GADTs, functoriality, and parametricity—the "pick two" result (Johann et al., 2021).

7. Implications and Use-Case Guidance

In OCaml, the choice of encoding for GADTs (and, by extension, for advanced type relations such as subtyping) should be guided by intended usage:

For type-indexed, pattern-matching style programming (typed ASTs, DSLs, invariants), use direct Church-style GADT syntax and encodings. These offer full parametric reasoning and simple pattern-matching semantics.
For generic traversals, pipelines, or container-like usage (requiring functoriality), employ explicit fixpoint representations, supplying a lawful map operation. Note that this precludes full non-uniform parametricity.
For user-exposed subtyping libraries, use the Church or positive-context encoding with modular implicits, achieving a flexible, lightweight interface akin to OCaml’s built-in variance mechanisms and object subtyping but with zero runtime cost (Yallop et al., 2019, Johann et al., 2021).

The impossibility of realizing all three properties together (GADTs, parametricity, functoriality) in one encoding requires careful attention in the design of type-level libraries. The conversions and existence of multiple equivalent encodings permit a high degree of flexibility in practical OCaml type-level programming.

Markdown Report Issue Upgrade to Chat

References (2)

First-Class Subtypes (2019)

GADTs, Functoriality, Parametricity: Pick Two (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OCaml GADT Encoding.