Token Velocity in Digital Systems

Updated 1 February 2026

Token velocity is defined as a metric that quantifies the movement and turnover of tokens in systems, revealing patterns in financial, LLM, and Transformer applications.
It employs methods like micro-velocity analysis on-chain and throughput metrics in LLM serving to optimize system performance and resource allocation.
Empirical studies demonstrate its impact through high token turnover rates, heterogeneous usage among participants, and enhanced scheduling in digital pipelines.

Token velocity refers to a family of quantitative metrics characterizing the rate at which tokens—whether representing digital assets, information-carrying vectors in neural models, or computation units in distributed systems—are transferred, processed, or evolve across temporal or algorithmic boundaries. The unifying theme across applications is the interpretation of "velocity" as an indicator of system dynamism, throughput, or turnover, providing fine-grained insight into the movement and utilization of tokens beyond simple aggregate counts.

1. Micro-Velocity in On-Chain Asset Circulation

The concept of token velocity originated in monetary economics as the "velocity of money," operationalized as the total value of transactions over the money supply. Extending this to on-chain assets, particularly liquid staking tokens (LSTs) such as stETH and wstETH on Ethereum, micro-velocity provides an address-level metric of turnover, capturing agent-level heterogeneity in token usage (Kraner et al., 21 Aug 2025).

Let $w_i^t(\tau)$ denote the amount of tokens held by address $i$ at time $t$ with holding age $\tau$ , and $M_i(t) = \sum_\tau w_i^t(\tau)$ the total balance. The empirical age distribution at $i$ is $P_i^t(\tau) = w_i^t(\tau)/M_i(t)$ . Micro-velocity for address $i$ is

$V_i(t) = \sum_\tau \frac{1}{\tau} P_i^t(\tau)$

with global velocity given by the balance-weighted mean,

$V(t) = \frac{\sum_i M_i(t)\,V_i(t)}{\sum_i M_i(t)}.$

This approach enables behavioral decomposition by cohort (e.g., "whales", "retail"), with $V_C(t)$ denoting category $C$ 's velocity and $s_C(t) = \frac{M_C(t)\,V_C(t)}{M(t)\,V(t)}$ its share.

Empirical findings reveal exceptionally high global velocities ( $10^{-2}$ – $10^{-1}$ blocks $^{-1}$ ) for both stETH and wstETH, persistent concentration of turnover among large institutional accounts, and a market-driven transition from rebasing stETH to the composable, non-rebasing wstETH, which dominates DeFi protocol usage (Kraner et al., 21 Aug 2025).

2. Token Velocity in LLM Inference and Serving Systems

In distributed LLM serving architectures, token velocity quantifies the processing capacity or flow rate (tokens/second) of each stage—prefill, network, or decode—under current resource allocation (Lai et al., 3 Dec 2025). Formally, for the prefill stage,

$V_P = \max_{\rm prefill\;load}(\text{tokens completed}/\text{sec}),$

while the decode stage is parameterized by

$V_D = \frac{ \sum_{r\in R} L_r }{ \mathrm{TPOT} }$

where $R$ is the set of completed decode requests, and $L_r$ the total tokens per request.

Token velocity functions as a leading indicator for proactive autoscaling, enabling systems such as TokenScale to match resource provisioning to real-time demand by comparing instantaneous token arrival rates to per-stage capacity. This allows rapid adaptation to bursty workloads, reducing SLO violations (e.g., TTFT and TPOT) and over-provisioning compared to indicators such as GPU utilization or queue length. Convertible Decoders leverage velocity metrics to dynamically multiplex prefill and decode tasks, preserving SLOs during demand spikes (Lai et al., 3 Dec 2025).

3. Token Velocity in Preemptive LLM Streaming and Scheduling

In real-time text-generation pipelines such as TokenFlow, token velocity metrics underpin the prioritization and scheduling of requests under bursty conditions (Chen et al., 3 Oct 2025). Key quantities include:

Time-to-first-token (TTFT): $t_i^{\mathrm{ttft}} = t_{i,1}^{\mathrm{gen}} - t_i^0$
Time-between-tokens (TBT): $\delta_{i,j} = t_{i,j}^{\mathrm{gen}} - t_{i,j-1}^{\mathrm{gen}}$

Effective throughput refines raw tokens/sec by weighting tokens as per their buffer occupancy: $\mathrm{EffThruput} = \frac{1}{T} \sum_{i=1}^N \sum_{j=1}^{L_i} w_{i,j},$ with $w_{i,j}$ decreasing when token buffers overflow and increasing when tokens are consumed in a timely fashion.

Token velocity-based priority functions dynamically admit, preempt, or resume requests to optimize responsiveness and real user-side throughput. Proactive, overlapped migration of key-value caches is coordinated in large part by token velocity and buffer-driven utility signals, delivering large gains in effective throughput and latency under both simulated and production loads (Chen et al., 3 Oct 2025).

4. Token Velocity as a Learned or Geometric Feature in Transformers

Token velocity also appears as an abstract feature within the deep layers of Transformer architectures. In OrthoRank, token velocity quantifies the rate at which a given token’s representation moves (via cosine similarity) toward a nearly stationary "sink token" as layers deepen (Shin et al., 5 Jul 2025). For normalized hidden states $h_i^l$ , the cosine-based velocity is: $v_i^l = \cos(\bar s^{l+1}, \bar h_i^{l+1}) - \cos(\bar s^l, \bar h_i^l).$ OrthoRank demonstrates that the magnitude of the gradient of this similarity is proportional to the squared orthogonality $1-\cos^2(\bar s^l,\bar h_i^l)$ , and uses this as an importance score: tokens least aligned with the sink (i.e., carrying the fastest new information) are selected for further computation, reducing inference cost while often improving perplexity and accuracy (Shin et al., 5 Jul 2025).

5. Continuous-Time Kinematic Token Velocity for Sequential Decision Policies

In the Kinematic Tokenization framework for noisy time series, token velocity is explicitly constructed as the first derivative $c_{1,k}$ of a fitted cubic spline to log-price data (Kearney, 15 Jan 2026). Here, each segment of the time series is parameterized by position $c_{0,k}$ , velocity $c_{1,k}$ , acceleration $c_{2,k}$ , and jerk $c_{3,k}$ coefficients, extracted from a variational spline denoising objective: $\min_{x(t), v(t), \{w_k\}} \int_{t_0}^{t_N} \frac12 v(t)^2 dt + \frac{\alpha^2}{2} \sum_{k=0}^N w_k^2, \quad \text{subject to}~ y_k = x(t_k) + w_k,~ \ddot x(t) = v(t).$ The velocity token $c_{1,k}$ is a denoised leading indicator of momentum, allowing Transformer-based policies to discriminate between genuine signal and noise, especially under risk-averse, abstention-inducing loss functions (Kearney, 15 Jan 2026). Empirically, action calibration and risk-adjusted returns are only attainable when leveraging this denoised velocity token, in contrast to collapsed policies under discrete or finite-difference tokenizations.

6. Empirical Insights and Applications

Direct measurement and decomposition of token velocity have produced notable insights across domains:

In DeFi, token velocity quantifies the concentration of activity among large holders and the evolving composability of LSTs in protocol infrastructure, enabling granular monitoring of on-chain money dynamics (Kraner et al., 21 Aug 2025).
In LLM serving and streaming, velocity-based metrics optimize both system SLO compliance and resource usage, enabling proactive scaling and preemptive scheduling that dramatically improve practical throughput and latency (Lai et al., 3 Dec 2025, Chen et al., 3 Oct 2025).
In neural networks, velocity as a geometric or kinematic feature underlies efficient token selection and more robust sequential decision policies under adversarial or noisy conditions (Shin et al., 5 Jul 2025, Kearney, 15 Jan 2026).

7. Comparative Summary of Token Velocity Metrics

Context	Velocity Metric	Role/Impact
On-chain assets (Kraner et al., 21 Aug 2025)	Micro-velocity $V_i(t)$ per address	Quantifies unit-level circulation, shows usage skew
LLM serving (Lai et al., 3 Dec 2025)	Max tokens/sec per pipeline stage	Proactive autoscaling, burst absorption
LLM streaming (Chen et al., 3 Oct 2025)	TTFT, TBT, weighted EffThruput	Responsive scheduling, preemption via buffer-aware utility
Transformer internals (Shin et al., 5 Jul 2025)	Cosine-similarity velocity on hypersphere	Token selection by geometric importance
Financial signals (Kearney, 15 Jan 2026)	Spline-derivative $c_{1,k}$	Denoised sequential decision input

The diversity of definitions reflects the centrality of token velocity as a unifying metric of flow and adaptive control in modern computational and financial systems.