Sticky Tokens in Blockchain, NLP & Systems
- Sticky tokens are elements whose inherent behaviors cause state persistence or dominance across blockchain protocols, NLP models, and capability systems.
- In blockchain systems like BRC20, sticky tokens exploit two-stage transfer fee dynamics, leading to liquidity locks and potential asset pinning during network congestion.
- In NLP and secure systems, sticky tokens either degrade clustering performance via embedding mean attraction or enforce strict control flows to prevent capability misuse.
“Sticky tokens” are a technical phenomenon appearing across disparate areas: blockchain asset protocols (notably the BRC20 standard), machine learning-based text embedding models, and secure systems architectures using linear capabilities. Across contexts, “sticky token” refers to a token whose intrinsic behavior or vulnerability causes state persistence, undue influence, or protocol-level inflexibility—often with critical security or reliability consequences.
1. Sticky Tokens in BRC20 and Inscription-Based Blockchains
In the BRC20 token ecosystem on Bitcoin, sticky tokens arise from a vulnerability tied to the two-step transfer protocol based on the Bitcoin UTXO model. BRC20 tokens are implemented by inscribing JSON-style data onto individual satoshis using OP_FALSE/Taproot witnesses. A BRC20 transfer from sender to recipient is realized as two distinct on-chain transactions:
- Tx₁: InscribeTransfer—Encodes the transfer intent by sending a UTXO (back to ) containing the inscription “op":"transfer","tick":"XXX","amt":. This reduces ’s available balance () by and increments their transferable balance () by .
- Tx₂: ExecuteTransfer—Consumes the inscribed UTXO to pay out to , returning 0 for 1; 2 for 3 increases by 4.
The off-chain invariant is 5.
An attacker exploits fee market dynamics: Tx₁ typically clears at a relatively low fee (6), while Tx₂ requires a substantially higher fee (7, generally 8 to 9) to be promptly mined under congestion. The attack sets 0 in the interval 1, where 2 is the minimum acceptable fee for mining Tx₁ and 3 is the minimum for Tx₂. This causes Tx₁ to be confirmed and Tx₂ to be persistently queued (“pinned”) in the mempool. As long as 4 and mempool congestion 5 is high enough that 6, the probability the transfer remains unconfirmed after 7 blocks is:
8
Tokens are then “sticky” in 9, locking the associated liquidity.
Empirically, this attack was demonstrated against Binance’s ORDI hot wallet. Over multiple close-timed transactions, up to 8.21 million ORDI (approx. \$9M) became unspendable for 3.5 hours, requiring manual fee bumping to release the tokens. The vulnerability generalizes: 14 out of the 15 largest inscription-token protocols using the UTXO/two-step model (≥93.3%) are susceptible, including BRC20, DRC20 (Dogecoin), and ARC20 (Bitcoin L2). Only EVM-derived single-step protocols evade this flaw (Qi et al., 2024).
2. Sticky Tokens in Text Embedding Models
In transformer-based text embedding architectures, “sticky tokens” are defined as tokens that, when repeatedly inserted into sentences (prefix, suffix, random insertion), drive the resulting embedding similarity toward the mean pairwise similarity 0 of token embeddings:
1
where 2 is cosine similarity and 3 is the embedding function. A token 4 is sticky if for all sentence pairs 5 and insertion patterns 6, 7 for suitably chosen 8 and model-specific 9.
Detection proceeds by computing candidate sets through filtering (e.g., removing undecodable tokens), sticky scoring over random low-similarity sentence pairs, and exhaustive validation. Experiments across 40 models and 14 families reveal 868 sticky tokens (0.006%–1% of filtered vocabulary), including special tokens (e.g., </s>, <extra_id_X>, [CLS]), rare subwords, and multilingual fragments. There is no significant correlation with model or vocabulary size.
Sticky tokens can degrade downstream clustering and retrieval by up to 50% in task metrics, contrasting with 05% perturbations for random “normal” tokens. Attention analysis finds that sticky tokens receive disproportionately high attention weights in deeper layers, leading to representational domination and loss of discriminative power (Chen et al., 24 Jul 2025).
3. Sticky Tokens and Capability Machine Calling Conventions
In systems security, “sticky tokens” is colloquially used for the StkTokens convention in CHERI-like capability machines, based on linear capabilities. A capability 1, with 2, is linear if 3. Linear capabilities are guaranteed single-use: any operation (MOVE, STORE, LOAD) clears linear data to 0, guaranteeing that no two aliases exist.
The StkTokens calling convention compiles a single scall pseudo-instruction into a sequence that splits the linear stack capability, seals private stack frames and the return PC under a global seal, and ensures that all returns are well-bracketed. Hardware-enforced linearity prevents the reuse or stashing of return capabilities, thus compelling strict stack encapsulation and well-bracketed control flow.
The design is formalized via a fully abstract overlay semantics, with theorems of contextual equivalence—proving that no attacker context can distinguish (e.g., via mis-returns) more than in the ideal source model. The clearing of linear capabilities blocks any aliasing or multi-use, preventing adversarial manipulation of the stack despite arbitrary splitting/splicing (Skorstengaard et al., 2018).
4. Quantitative Impact and Empirical Evidence
Blockchain Example: In the Binance ORDI case, sticky tokens induced by the pinning attack led to a complete withdrawal halt for 3.5 hours. Recovery required fee bumping to 404 sat/vB for all delayed Tx₂s, validating the attack mechanism under real market and fee conditions (Qi et al., 2024).
NLP Example: Across 11 embedding models and 15 representative tasks (from clustering to retrieval), sticky tokens yield mean performance drops of up to 35–54% (e.g., Biorxiv clustering drops from 23.11 to 15.02; NFCorpus retrieval from 28.64 to 13.65). A paired t-test over 15 tasks confirms statistical significance (4) (Chen et al., 24 Jul 2025).
Capability Machine Example: Rigorous stepwise semantics combined with a Kripke logical relation demonstrate that StkTokens enforce well-bracketed discipline and forbid practical and theoretical attempts to steal or reuse stack frames, even in adversarial contexts (Skorstengaard et al., 2018).
5. Mechanistic Explanation and Theoretical Interpretation
Sticky tokens exploit protocol-level state bifurcation, model anisotropy, or hardware enforcement to produce “sticky” behaviors:
- Blockchain: The two-step transfer splits the state into 5 and 6, allowing adversaries to exploit differential confirmation probabilities under congestion, pinning assets in a semantically incomplete transfer state.
- Embeddings: Certain tokens, often unintentionally arising from tokenization artifacts or training data anomalies, dominate self-attention in later layers, skewing global sentence representations toward embedding space isotropy (mean attraction).
- Capabilities: The hardware-mandated linearity property, combined with sealing, prevents stack capability reuse, stopping attacks that would otherwise escape well-bracketed return/call order.
A plausible implication is that “stickiness” indicates a more general pattern: protocols with multi-stage handshakes or representational spaces lacking explicit isotropy control are at risk for persistent or dominating elements.
6. Detection and Mitigation Strategies
Blockchain Tokens:
- Dynamic fee floor: Always enforce 7, preventing transfers at vulnerable fee rates.
- Real-time pinning detection: Monitor pending inscription transfers and automatically use Replace-By-Fee to raise Tx₂ fees past 8 when timeouts are exceeded.
- Hierarchical authorization: Require MFA or multisig approval for large-value transfers to decrease attack surface.
Pseudocode summary:
9 0 1 (Qi et al., 2024)
Embedding Models:
- Tokenizer sanitization: Remove special, unused, or low-frequency tokens (example: <extra_id_X>, foreign script fragments) before fine-tuning.
- Runtime detection: Scan incoming text for sticky token occurrences; exclude or replace as needed in search or retrieval settings.
- Regularized training: Employ objectives (e.g., isotropic loss, whitening) to mitigate mean-pulling directions (Chen et al., 24 Jul 2025).
Capability Machines:
- Enforce execution semantics that clear linear capabilities on move/load/store and utilize sealed capabilities for stack and return management.
7. Conclusion and Research Outlook
Sticky tokens represent a broad class of protocol, model, or hardware-level elements whose state “sticks” or unduly dominates, leading to vulnerabilities or degraded performance. In BRC20, they emerge from mempool pinning attacks inherent in two-step transfer mechanisms. In transformer embedding models, they are tokens whose repeated presence collapses pairwise similarity structure, reducing the expressiveness of the model. In capability machines, sticky tokens enforce—rather than undermine—security by rigorously blocking aliasing and enforcing stack encapsulation.
Across domains, sticky tokens highlight the critical importance of holistic system design—spanning protocol mechanics, representational geometry, and hardware semantics. Effective mitigations require dynamic controls, real-time monitoring, and carefully calibrated abstractions. Their study continues to influence both cryptoeconomic systems and foundational machine learning architectures (Qi et al., 2024, Chen et al., 24 Jul 2025, Skorstengaard et al., 2018).