Papers
Topics
Authors
Recent
Search
2000 character limit reached

Token Velocity in Digital Systems

Updated 1 February 2026
  • Token velocity is defined as a metric that quantifies the movement and turnover of tokens in systems, revealing patterns in financial, LLM, and Transformer applications.
  • It employs methods like micro-velocity analysis on-chain and throughput metrics in LLM serving to optimize system performance and resource allocation.
  • Empirical studies demonstrate its impact through high token turnover rates, heterogeneous usage among participants, and enhanced scheduling in digital pipelines.

Token velocity refers to a family of quantitative metrics characterizing the rate at which tokens—whether representing digital assets, information-carrying vectors in neural models, or computation units in distributed systems—are transferred, processed, or evolve across temporal or algorithmic boundaries. The unifying theme across applications is the interpretation of "velocity" as an indicator of system dynamism, throughput, or turnover, providing fine-grained insight into the movement and utilization of tokens beyond simple aggregate counts.

1. Micro-Velocity in On-Chain Asset Circulation

The concept of token velocity originated in monetary economics as the "velocity of money," operationalized as the total value of transactions over the money supply. Extending this to on-chain assets, particularly liquid staking tokens (LSTs) such as stETH and wstETH on Ethereum, micro-velocity provides an address-level metric of turnover, capturing agent-level heterogeneity in token usage (Kraner et al., 21 Aug 2025).

Let wit(τ)w_i^t(\tau) denote the amount of tokens held by address ii at time tt with holding age τ\tau, and Mi(t)=τwit(τ)M_i(t) = \sum_\tau w_i^t(\tau) the total balance. The empirical age distribution at ii is Pit(τ)=wit(τ)/Mi(t)P_i^t(\tau) = w_i^t(\tau)/M_i(t). Micro-velocity for address ii is

Vi(t)=τ1τPit(τ)V_i(t) = \sum_\tau \frac{1}{\tau} P_i^t(\tau)

with global velocity given by the balance-weighted mean,

V(t)=iMi(t)Vi(t)iMi(t).V(t) = \frac{\sum_i M_i(t)\,V_i(t)}{\sum_i M_i(t)}.

This approach enables behavioral decomposition by cohort (e.g., "whales", "retail"), with VC(t)V_C(t) denoting category CC's velocity and sC(t)=MC(t)VC(t)M(t)V(t)s_C(t) = \frac{M_C(t)\,V_C(t)}{M(t)\,V(t)} its share.

Empirical findings reveal exceptionally high global velocities (10210^{-2}10110^{-1} blocks1^{-1}) for both stETH and wstETH, persistent concentration of turnover among large institutional accounts, and a market-driven transition from rebasing stETH to the composable, non-rebasing wstETH, which dominates DeFi protocol usage (Kraner et al., 21 Aug 2025).

2. Token Velocity in LLM Inference and Serving Systems

In distributed LLM serving architectures, token velocity quantifies the processing capacity or flow rate (tokens/second) of each stage—prefill, network, or decode—under current resource allocation (Lai et al., 3 Dec 2025). Formally, for the prefill stage,

VP=maxprefill  load(tokens completed/sec),V_P = \max_{\rm prefill\;load}(\text{tokens completed}/\text{sec}),

while the decode stage is parameterized by

VD=rRLrTPOTV_D = \frac{ \sum_{r\in R} L_r }{ \mathrm{TPOT} }

where RR is the set of completed decode requests, and LrL_r the total tokens per request.

Token velocity functions as a leading indicator for proactive autoscaling, enabling systems such as TokenScale to match resource provisioning to real-time demand by comparing instantaneous token arrival rates to per-stage capacity. This allows rapid adaptation to bursty workloads, reducing SLO violations (e.g., TTFT and TPOT) and over-provisioning compared to indicators such as GPU utilization or queue length. Convertible Decoders leverage velocity metrics to dynamically multiplex prefill and decode tasks, preserving SLOs during demand spikes (Lai et al., 3 Dec 2025).

3. Token Velocity in Preemptive LLM Streaming and Scheduling

In real-time text-generation pipelines such as TokenFlow, token velocity metrics underpin the prioritization and scheduling of requests under bursty conditions (Chen et al., 3 Oct 2025). Key quantities include:

  • Time-to-first-token (TTFT): tittft=ti,1genti0t_i^{\mathrm{ttft}} = t_{i,1}^{\mathrm{gen}} - t_i^0
  • Time-between-tokens (TBT): δi,j=ti,jgenti,j1gen\delta_{i,j} = t_{i,j}^{\mathrm{gen}} - t_{i,j-1}^{\mathrm{gen}}

Effective throughput refines raw tokens/sec by weighting tokens as per their buffer occupancy: EffThruput=1Ti=1Nj=1Liwi,j,\mathrm{EffThruput} = \frac{1}{T} \sum_{i=1}^N \sum_{j=1}^{L_i} w_{i,j}, with wi,jw_{i,j} decreasing when token buffers overflow and increasing when tokens are consumed in a timely fashion.

Token velocity-based priority functions dynamically admit, preempt, or resume requests to optimize responsiveness and real user-side throughput. Proactive, overlapped migration of key-value caches is coordinated in large part by token velocity and buffer-driven utility signals, delivering large gains in effective throughput and latency under both simulated and production loads (Chen et al., 3 Oct 2025).

4. Token Velocity as a Learned or Geometric Feature in Transformers

Token velocity also appears as an abstract feature within the deep layers of Transformer architectures. In OrthoRank, token velocity quantifies the rate at which a given token’s representation moves (via cosine similarity) toward a nearly stationary "sink token" as layers deepen (Shin et al., 5 Jul 2025). For normalized hidden states hilh_i^l, the cosine-based velocity is: vil=cos(sˉl+1,hˉil+1)cos(sˉl,hˉil).v_i^l = \cos(\bar s^{l+1}, \bar h_i^{l+1}) - \cos(\bar s^l, \bar h_i^l). OrthoRank demonstrates that the magnitude of the gradient of this similarity is proportional to the squared orthogonality 1cos2(sˉl,hˉil)1-\cos^2(\bar s^l,\bar h_i^l), and uses this as an importance score: tokens least aligned with the sink (i.e., carrying the fastest new information) are selected for further computation, reducing inference cost while often improving perplexity and accuracy (Shin et al., 5 Jul 2025).

5. Continuous-Time Kinematic Token Velocity for Sequential Decision Policies

In the Kinematic Tokenization framework for noisy time series, token velocity is explicitly constructed as the first derivative c1,kc_{1,k} of a fitted cubic spline to log-price data (Kearney, 15 Jan 2026). Here, each segment of the time series is parameterized by position c0,kc_{0,k}, velocity c1,kc_{1,k}, acceleration c2,kc_{2,k}, and jerk c3,kc_{3,k} coefficients, extracted from a variational spline denoising objective: minx(t),v(t),{wk}t0tN12v(t)2dt+α22k=0Nwk2,subject to yk=x(tk)+wk, x¨(t)=v(t).\min_{x(t), v(t), \{w_k\}} \int_{t_0}^{t_N} \frac12 v(t)^2 dt + \frac{\alpha^2}{2} \sum_{k=0}^N w_k^2, \quad \text{subject to}~ y_k = x(t_k) + w_k,~ \ddot x(t) = v(t). The velocity token c1,kc_{1,k} is a denoised leading indicator of momentum, allowing Transformer-based policies to discriminate between genuine signal and noise, especially under risk-averse, abstention-inducing loss functions (Kearney, 15 Jan 2026). Empirically, action calibration and risk-adjusted returns are only attainable when leveraging this denoised velocity token, in contrast to collapsed policies under discrete or finite-difference tokenizations.

6. Empirical Insights and Applications

Direct measurement and decomposition of token velocity have produced notable insights across domains:

  • In DeFi, token velocity quantifies the concentration of activity among large holders and the evolving composability of LSTs in protocol infrastructure, enabling granular monitoring of on-chain money dynamics (Kraner et al., 21 Aug 2025).
  • In LLM serving and streaming, velocity-based metrics optimize both system SLO compliance and resource usage, enabling proactive scaling and preemptive scheduling that dramatically improve practical throughput and latency (Lai et al., 3 Dec 2025, Chen et al., 3 Oct 2025).
  • In neural networks, velocity as a geometric or kinematic feature underlies efficient token selection and more robust sequential decision policies under adversarial or noisy conditions (Shin et al., 5 Jul 2025, Kearney, 15 Jan 2026).

7. Comparative Summary of Token Velocity Metrics

Context Velocity Metric Role/Impact
On-chain assets (Kraner et al., 21 Aug 2025) Micro-velocity Vi(t)V_i(t) per address Quantifies unit-level circulation, shows usage skew
LLM serving (Lai et al., 3 Dec 2025) Max tokens/sec per pipeline stage Proactive autoscaling, burst absorption
LLM streaming (Chen et al., 3 Oct 2025) TTFT, TBT, weighted EffThruput Responsive scheduling, preemption via buffer-aware utility
Transformer internals (Shin et al., 5 Jul 2025) Cosine-similarity velocity on hypersphere Token selection by geometric importance
Financial signals (Kearney, 15 Jan 2026) Spline-derivative c1,kc_{1,k} Denoised sequential decision input

The diversity of definitions reflects the centrality of token velocity as a unifying metric of flow and adaptive control in modern computational and financial systems.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Token Velocity.