Composition Tokens: Structure and Applications

Updated 29 June 2026

Composition tokens are discrete or continuous entities whose semantics arise from explicitly combining components through defined additive and orthogonal rules.
They are applied across visual tokenization, prompt composition in multi-task models, and blockchain finance to enable structured abstraction and interoperability.
Empirical studies reveal that composition tokens enhance performance metrics, facilitate information disentanglement, and manage layered risk in decentralized asset systems.

A composition token is a token—discrete or continuous—whose semantics and functionality arise from explicitly representing or enabling the combination of underlying components, sources, behaviors, or modalities. In contemporary machine learning, blockchain finance, and digital asset frameworks, composition tokens serve as structural primitives for hierarchical abstraction, information disentanglement, or on-chain claim layering. Their design is governed by explicit rules of composition, orthogonality, or convertibility, yielding principled mechanisms for interpretability, interoperability, and risk management.

1. Formal Foundations and Definitions

In high-dimensional representation learning and decentralized asset systems, composition tokens are defined by strict mathematical or protocol-level composition rules:

In visual tokenization (Semanticist), a composition token $z_i$ is a continuous vector in a causal sequence such that each token adds statistically orthogonal information to the latent representation. These composition tokens are constructed to mirror principal components— $z_1$ capturing maximal explained variance, $z_2$ the next largest, down to $z_K$ —providing a provable hierarchy of information (Wen et al., 11 Mar 2025).
In DeFi and on-chain asset protocols, composition tokens denote recursively stacked claims, where each protocol-issued token (e.g., a staking derivative, lending receipt, or synthetic asset) is a claim on a lower-tier token, forming an explicit hierarchy or directed graph of dependencies (Wu, 2 Mar 2026, Harrigan et al., 2024).
In two-tier asset tokenization architectures, "element" tokens ( $e_1,\ldots,e_n$ ) represent granular asset components, and the "everything" (composition) token $E$ is a fixed bundle $E \equiv w_1 e_1 + \cdots + w_n e_n$ , with contractually enforced one-to-one convertibility (Borjigin et al., 15 Aug 2025).

On the technical level, the defining criteria are:

Additivity or strict bundling (e.g., fixed or dynamic sum/product rules in finance and vision);
Orthogonality and ordering (e.g., uncorrelated contributions and descending importance in representation learning);
Composable addressability (e.g., support for plug-and-play or instruction-based composition).

2. Composition Token Architectures and Mechanisms

The architecture of composition tokens is contingent on modality and application domain:

Sequential Visual Tokenizers (Semanticist, COMiT): Images are encoded into a 1D causal sequence of $K$ composition tokens $\{z_1,\ldots,z_K\}$ using a ViT backbone with causal masking. Training enforces a PCA-like structure by nested dropout and orthogonality constraints: each token explains the maximal remaining variance, and incremental "effects" $\Delta\epsilon_i$ are constrained to be pairwise orthogonal (Wen et al., 11 Mar 2025). COMiT implements a recurrent, communication-inspired sequence, where each update refines a global latent code upon observing local crops, leading to interpretable, object-centric tokens with compositional semantics (Davtyan et al., 24 Feb 2026).
Prompt Composition in Multi-Task Models (SpeechComposer): SpeechComposer defines a minimal set of primitive prompt tokens, with composite tasks (e.g., voice conversion) emerging by concatenating primitive token sequences. This allows a vanilla decoder-only Transformer to multiplex tasks via linear prompt composition, enforcing task modularity and enabling parameter sharing (Wu et al., 2024).
Behavioral and Steering Tokens in LLMs: Individual steering tokens $z_1$ 0 are learned via self-distillation against natural-language instructions, while a dedicated composition token $z_1$ 1and $z_1$ 2 is optimized for compositional generalization. At inference, arbitrary combinations of behaviors are enacted by interleaving $z_1$ 3and $z_1$ 4 and steering tokens in the prompt, yielding zero-shot composition (Radevski et al., 8 Jan 2026).
Financial and Asset Tokenization: In DeFi, composition tokens arise as each protocol layer accepts tokens and issues higher-tier derivative claims (e.g., ETH $z_1$ 5 stETH $z_1$ 6 awstETH), forming an explicit credit hierarchy, empirically mapped as a directed graph over contract addresses (Wu, 2 Mar 2026, Harrigan et al., 2024). In two-tier asset tokenization, composition tokens enforce on-chain mint/redeem operations aligning a "bundle" token $z_1$ 7 with a set of elements $z_1$ 8 at programmable ratios (Borjigin et al., 15 Aug 2025).

3. Mathematical and Structural Properties

The key mathematical properties of composition tokens include:

Explained-Variance Decomposition (visual tokenization): The sequence $z_1$ 9 (where $z_2$ 0 for learned guidance scales $z_2$ 1) mirrors principal component analysis, and the residual is orthogonal to all active increments. This induces a structural hierarchy—earlier tokens encode coarse semantics, later tokens refine detail (Wen et al., 11 Mar 2025).
Orthogonality: In both Semanticist and compositional LLM steering, tokens are regularized so their incremental effects or embeddings are orthogonal, ensuring that composition yields non-redundant, interpretable structure (Wen et al., 11 Mar 2025, Radevski et al., 8 Jan 2026).
Layered Credit Graphs and Graph-Theoretic Depth: In on-chain hierarchy analysis, the composition graph $z_2$ 2 maps token dependencies, supports component and cycle analysis, and quantifies path depth (longest compositional chain), revealing structural bottlenecks and risk (Harrigan et al., 2024).
Arbitrage Band Enforcement: In two-tier asset architecture, price bands are rigorously enforced by convertibility constraints, with arbitrage ensuring $z_2$ 3 tracks $z_2$ 4 (Borjigin et al., 15 Aug 2025).

Table: Composition Token Structural Properties (selected modalities)

Property	Vision (Semanticist)	LLM Steering	DeFi/Asset Tokenization
Information Law	PCA-like, orthogonal increments	Orthogonal embeddings	DAG/layered claims
Ordering	Descending explained variance	Token sequence	Tiers/graph depth
Composability	Prefix-length decoded images	Token chaining	On-chain mint/redeem, graph paths
Interpretability	Semantics mapped to early tokens	Explicit behaviors	Asset/claim provenance
Risk/Robustness	Spectrum-decoupled representations	Generalizes to unseen comp.	Counterparty risk/collapse

4. Empirical Performance and Interpretability Gains

Composition token implementations have yielded empirical gains in representation compactness, interpretability, and downstream task efficiency:

Semanticist achieves state-of-the-art visual reconstruction (rFID ≈ 0.72 on ImageNet-256) with as few as 32 tokens, and human perceptual tests indicate that only 2–4 tokens suffice to induce natural-scene confusion at 50% rates, paralleling global precedence effects in human vision (Wen et al., 11 Mar 2025).
Compositional LLM steering tokens outperform instruction-based and activation-steering baselines for multi-behavior composition: on unseen 2- and 3-behavior tasks, composition tokens achieve 5–8 pp higher mean accuracy (e.g., 76.9% vs. 71.8%), with reductions in order variance and robustness to scaling across model size (Radevski et al., 8 Jan 2026).
DeFi Layering and Systemic Leverage: By late 2025, each $z_2$ 54.7 of total claims (layering multiplier $z_2$ 6), with lending and staking driving 83% of all token stacking. Yield decays by −2.95 pp per additional composition hop, and crisis periods amplify tier premiums and systemic risk (Wu, 2 Mar 2026).
Two-Tier Asset Bundling: Real world asset tokenization achieves near-instant arbitrage alignment ( $z_2$ 7 NAV) and enables fine-grained ownership, market segmentation (e.g., hedging only CO $z_2$ 8 exposure), and programmable revenues—all enforced algorithmically by composition token mechanics (Borjigin et al., 15 Aug 2025).

5. Risks, Limitations, and Security

Composition token architectures are subject to modality-specific risks:

Financial Systems: Deep token composition increases systemic vulnerability; significant shocks or failures at foundational asset layers (e.g., stablecoins) can propagate through composition chains due to transitive risk (Harrigan et al., 2024, Wu, 2 Mar 2026). Arbitrage mechanisms and explicit compositional graphs offer partial mitigation by making dependencies auditable.
Security in LLM Tokenizer Expansion: Vulnerabilities such as "breaker tokens" arise during tokenizer transplant in modular LLM systems. These can be engineered to remain inert in donor models but activate malicious features in recipient models by means of coefficient-reuse and dual-objective optimization, evading standard spectral audits and even persisting through model fine-tuning or merging (Liu et al., 31 Dec 2025). Defense strategies include behavioral auditing (e.g., sequence emission rate differentials), restricting coefficient reuse, and cryptographic metadata.
Representation Learning: Spectrum-decoupling remains nontrivial; in many 1D tokenizers, semantic and spectral information are entangled, hindering interpretability. Diffusion decoders and attention mechanisms are employed to ameliorate such coupling (Wen et al., 11 Mar 2025).

6. Extensions and Cross-Domain Generalization

Recent work generalizes composition token principles across diverse modalities:

Audio, video, and multimodal data: The nested-dropout + diffusion-decoder recipe underlying visual composition tokens is directly extendable to audio and multimodal tokenization, yielding sequences of variance-ordered, orthogonal tokens for tasks such as speech representation and multi-modal fusion (Wen et al., 11 Mar 2025, Wu et al., 2024).
Task-robustness via prompt composition: Simple concatenation of primitive prompt tokens scales to unseen composite speech and language tasks, often yielding zero-shot generalization with negligible performance degradation (Wu et al., 2024, Radevski et al., 8 Jan 2026).
Graph-based analysis in tokenized finance: Composition graphs empirically map system interconnection, centrality, and risk, while also serving as a foundation for macro-prudential regulation (Harrigan et al., 2024).

A plausible implication is that explicit compositional token architectures—anchored by provable information separation, hierarchical structuring, or on-chain convertibility—will continue to bridge modeling efficiency, transparency, and systemic robustness across AI and decentralized infrastructure.