TagDispatch: Dynamic LLM Dispatching

Updated 10 January 2026

TagDispatch is a dynamic dispatching framework that enables real-time switching between grammar fragments for structured generation in agentic LLMs.
It integrates an Aho–Corasick automaton for efficient tag detection with just-in-time compilation and cross-grammar caching to optimize decoding latency.
TagDispatch supports workflows like tool-calling and conditional channel switching, significantly reducing overhead compared to traditional CFG-based methods.

TagDispatch is a dynamic dispatching semantics for structured generation in agentic LLMs, designed to meet the challenges of modern workflows such as tool-calling, conditional channel switching, and dynamic JSON-schema generation. Unlike traditional constrained-decoding engines that rely on a statically defined context-free grammar (CFG), TagDispatch enables on-the-fly switching between sub-grammars, driven directly by the LLM's emitted tokens. The construct leverages the Aho–Corasick automaton for efficient tag detection and integrates just-in-time (JIT) compilation and cross-grammar caching to facilitate low-latency, scalable structured decoding in dynamic environments (Li et al., 7 Jan 2026).

1. Motivation and Context

TagDispatch addresses the limitations of classical constrained-decoding approaches in the context of agentic LLMs where structured generation is inherently dynamic. Traditional engines assume a single, statically known CFG and precompile all token-mask tables, rendering them unsuitable when the selection of grammars depends on LLM outputs—such as dynamically determined tool selections or runtime schema additions. TagDispatch is motivated by two core requirements: (a) real-time detection of the LLM's "mode" (i.e., which grammar branch is being taken), and (b) immediate, low-overhead installation of the corresponding grammar constraints without exhaustive pre-caching of all possible grammar combinations.

2. Formal Semantics of TagDispatch

A TagDispatch construct is formally defined as the tuple:

$TD = (A, \text{DispatchMap}, S)$

where:

$T = \{t_1, ..., t_m\}$ is the finite set of tags, with each $t_i \in \Sigma^+$ ;
$G = \{G_i\}_{i=1}^n$ is the collection of CFGs or regular-grammar fragments;
$S = \{s_1, ..., s_k\}$ is the set of stop-strings (signals that terminate dispatch);
$A$ is the Aho–Corasick automaton built over $T \cup S$ ;
$\text{DispatchMap}: T \to G$ assigns each tag $t_i$ to its target grammar $G_i$ .

At decode time, the mechanism maintains:

a mode bit $m \in \{\text{"dispatching"}, \text{"dispatched"}\}$ ;
a current grammar $G_{\text{cur}}$ (valid only if $m = \text{"dispatched"}$ );
an Aho–Corasick state $a \in \text{States}(A)$ .

The decoding logic is as follows. While in "dispatching" mode, each candidate token is fed through the AC update. Transitions are permitted only if they do not complete a prefix of both a tag and a stop-string. Completing a tag triggers a mode switch to "dispatched", associates the corresponding grammar, and resets the AC state; the mask generation then follows the conventional logic for CFG-constrained decoding (e.g., Earley parsing). Completing a stop-string prompts an exit from TagDispatch. Once "dispatched", mask generation adheres to $G_{\text{cur}}$ , reverting to "dispatching" upon completing $G_{\text{cur}}$ .

3. Algorithmic Integration

The mask generation for TagDispatch at each autoregressive time step is given by the following pseudocode:

state ← (m, a, G_cur, ParserState)
if m == “dispatching” then
  for each v ∈ Vocabulary do
    a′ ← AC.transition(a, v)
    if a′ completes any stop∈S then
      mask[v] ← 0; terminate loop entirely
    else if a′ completes tag tᵢ then
      mask[v] ← 1 for all v in tᵢ; m←“dispatched”; G_cur←Gᵢ; ParserState←init(Gᵢ);
    else
      mask[v] ← 1 (no grammar constraints yet)
  end
else  // m == “dispatched”
  (mask, ParserState′) ← EarleyMask(G_cur, ParserState)
  if ParserState′ indicates G_cur is done then
    m←“dispatching”; a←A.root
  end
end
return mask

The design avoids constructing a monolithic CFG interleaving all potential branches, and the tag detection (via AC) operates separately from the expensive grammar-masking logic. Grammars are JIT-compiled on entry, existing as "islands" only when invoked.

4. Complexity and Performance Analysis

TagDispatch is engineered for efficient per-token operations:

Tag matching via the Aho–Corasick automaton is $O(|v| + 1)$ per candidate token, which amortizes to $O(1)$ for fixed-length tags.
Earley mask generation per grammar state is worst-case $O(n^3)$ in input length $n$ , but typical per-token incremental updates cost $O(n^2)$ or $O(n)$ for unambiguous or deterministic fragments.
The per-token cost: $O(\text{tag-check}) + 1_{\text{dispatched}} \cdot O(\text{EarleyStep})$ . In practice, AC cost is sub-microsecond, with the Earley step dominating the "dispatched" mode.
Deferred (JIT) grammar compilation amortizes install cost over multiple tokens, while cross-grammar caching leverages reuse across grammars.

5. Implementation Optimizations

Numerous optimizations are integrated into TagDispatch to ensure high-throughput decoding:

Just-In-Time Compilation: Grammar fragments $G_i$ are compiled into FSM/Earley tables only upon entry, with a cache pool keyed by grammar hash for artifact reuse. Partial JIT precompiles the $K$ heaviest parser states during prompt processing to mask initial compile latency.
Cross-Grammar Caching: Shared sub-structures across grammars (e.g., JSON types) are hashed via stable BFS node numbering and non-commutative combination. When lookahead contexts differ, partial cache reuse is achieved by leveraging accepted/rejected sets and re-scanning only ambiguous tokens.

6. Empirical Evaluation

Empirical benchmarks demonstrate significant performance gains for TagDispatch over prior engines:

Metric	TagDispatch (Contour+XGrammar 2)	XGrammar (PDA-based)	llguidance
End-to-end decoding latency (dynamic tool-calling)	>6× reduction	Baseline	N/A
Per-token mask-generation overhead	~50 µs	~150 µs	>1 ms
Grammar compilation time	~10 ms	>1 s	N/A
Function-calling throughput (relative)	~7× improvement	Baseline	N/A
LLM-inference latency (vs. unconstrained)	within 6%	Higher	Higher

TagDispatch achieves near-zero overhead when not operating in a dispatched grammar and rapid on-demand installation when transitioning to new grammars (Li et al., 7 Jan 2026).

7. Use Cases, Benefits, and Trade-Offs

TagDispatch is most effective for workflows requiring dynamic, output-driven branching, including tool-calling APIs (function selection followed by structured arguments), conditional reasoning (special channels), and interleaved structured/free-text fields.

The principal advantages include:

Minimal decoding overhead in non-dispatched mode.
Low-latency, demand-driven grammar installation.
Extensive reuse of common fragments via cross-grammar caching.

Trade-offs encompass slightly increased decoding loop complexity and the necessity to tune JIT compile parameters (such as the number $K$ of precompiled parser states) to optimize between initial delay and per-token overhead. Extremely large one-off grammars still incur a compile cost on first entry, though recurring substructures benefit from cross-grammar caching, reducing redundant work.

TagDispatch provides a separation of "mode detection" (via tags) and "mode enforcement" (via CFG-based constrained decoding), offering a scalable and efficient structured-generation paradigm for advanced agentic LLM applications (Li et al., 7 Jan 2026).

PDF Markdown Chat (Pro)

References (1)

XGrammar 2: Dynamic and Efficient Structured Generation Engine for Agentic LLMs (2026)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to TagDispatch.