TagDispatch: Dynamic LLM Dispatching
- TagDispatch is a dynamic dispatching framework that enables real-time switching between grammar fragments for structured generation in agentic LLMs.
- It integrates an Aho–Corasick automaton for efficient tag detection with just-in-time compilation and cross-grammar caching to optimize decoding latency.
- TagDispatch supports workflows like tool-calling and conditional channel switching, significantly reducing overhead compared to traditional CFG-based methods.
TagDispatch is a dynamic dispatching semantics for structured generation in agentic LLMs, designed to meet the challenges of modern workflows such as tool-calling, conditional channel switching, and dynamic JSON-schema generation. Unlike traditional constrained-decoding engines that rely on a statically defined context-free grammar (CFG), TagDispatch enables on-the-fly switching between sub-grammars, driven directly by the LLM's emitted tokens. The construct leverages the Aho–Corasick automaton for efficient tag detection and integrates just-in-time (JIT) compilation and cross-grammar caching to facilitate low-latency, scalable structured decoding in dynamic environments (Li et al., 7 Jan 2026).
1. Motivation and Context
TagDispatch addresses the limitations of classical constrained-decoding approaches in the context of agentic LLMs where structured generation is inherently dynamic. Traditional engines assume a single, statically known CFG and precompile all token-mask tables, rendering them unsuitable when the selection of grammars depends on LLM outputs—such as dynamically determined tool selections or runtime schema additions. TagDispatch is motivated by two core requirements: (a) real-time detection of the LLM's "mode" (i.e., which grammar branch is being taken), and (b) immediate, low-overhead installation of the corresponding grammar constraints without exhaustive pre-caching of all possible grammar combinations.
2. Formal Semantics of TagDispatch
A TagDispatch construct is formally defined as the tuple:
where:
- is the finite set of tags, with each ;
- is the collection of CFGs or regular-grammar fragments;
- is the set of stop-strings (signals that terminate dispatch);
- is the Aho–Corasick automaton built over ;
- assigns each tag to its target grammar .
At decode time, the mechanism maintains:
- a mode bit ;
- a current grammar (valid only if );
- an Aho–Corasick state .
The decoding logic is as follows. While in "dispatching" mode, each candidate token is fed through the AC update. Transitions are permitted only if they do not complete a prefix of both a tag and a stop-string. Completing a tag triggers a mode switch to "dispatched", associates the corresponding grammar, and resets the AC state; the mask generation then follows the conventional logic for CFG-constrained decoding (e.g., Earley parsing). Completing a stop-string prompts an exit from TagDispatch. Once "dispatched", mask generation adheres to , reverting to "dispatching" upon completing .
3. Algorithmic Integration
The mask generation for TagDispatch at each autoregressive time step is given by the following pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
state ← (m, a, G_cur, ParserState)
if m == “dispatching” then
for each v ∈ Vocabulary do
a′ ← AC.transition(a, v)
if a′ completes any stop∈S then
mask[v] ← 0; terminate loop entirely
else if a′ completes tag tᵢ then
mask[v] ← 1 for all v in tᵢ; m←“dispatched”; G_cur←Gᵢ; ParserState←init(Gᵢ);
else
mask[v] ← 1 (no grammar constraints yet)
end
else // m == “dispatched”
(mask, ParserState′) ← EarleyMask(G_cur, ParserState)
if ParserState′ indicates G_cur is done then
m←“dispatching”; a←A.root
end
end
return mask |
The design avoids constructing a monolithic CFG interleaving all potential branches, and the tag detection (via AC) operates separately from the expensive grammar-masking logic. Grammars are JIT-compiled on entry, existing as "islands" only when invoked.
4. Complexity and Performance Analysis
TagDispatch is engineered for efficient per-token operations:
- Tag matching via the Aho–Corasick automaton is per candidate token, which amortizes to for fixed-length tags.
- Earley mask generation per grammar state is worst-case in input length , but typical per-token incremental updates cost or for unambiguous or deterministic fragments.
- The per-token cost: . In practice, AC cost is sub-microsecond, with the Earley step dominating the "dispatched" mode.
- Deferred (JIT) grammar compilation amortizes install cost over multiple tokens, while cross-grammar caching leverages reuse across grammars.
5. Implementation Optimizations
Numerous optimizations are integrated into TagDispatch to ensure high-throughput decoding:
- Just-In-Time Compilation: Grammar fragments are compiled into FSM/Earley tables only upon entry, with a cache pool keyed by grammar hash for artifact reuse. Partial JIT precompiles the heaviest parser states during prompt processing to mask initial compile latency.
- Cross-Grammar Caching: Shared sub-structures across grammars (e.g., JSON types) are hashed via stable BFS node numbering and non-commutative combination. When lookahead contexts differ, partial cache reuse is achieved by leveraging accepted/rejected sets and re-scanning only ambiguous tokens.
6. Empirical Evaluation
Empirical benchmarks demonstrate significant performance gains for TagDispatch over prior engines:
| Metric | TagDispatch (Contour+XGrammar 2) | XGrammar (PDA-based) | llguidance |
|---|---|---|---|
| End-to-end decoding latency (dynamic tool-calling) | >6× reduction | Baseline | N/A |
| Per-token mask-generation overhead | ~50 µs | ~150 µs | >1 ms |
| Grammar compilation time | ~10 ms | >1 s | N/A |
| Function-calling throughput (relative) | ~7× improvement | Baseline | N/A |
| LLM-inference latency (vs. unconstrained) | within 6% | Higher | Higher |
TagDispatch achieves near-zero overhead when not operating in a dispatched grammar and rapid on-demand installation when transitioning to new grammars (Li et al., 7 Jan 2026).
7. Use Cases, Benefits, and Trade-Offs
TagDispatch is most effective for workflows requiring dynamic, output-driven branching, including tool-calling APIs (function selection followed by structured arguments), conditional reasoning (special channels), and interleaved structured/free-text fields.
The principal advantages include:
- Minimal decoding overhead in non-dispatched mode.
- Low-latency, demand-driven grammar installation.
- Extensive reuse of common fragments via cross-grammar caching.
Trade-offs encompass slightly increased decoding loop complexity and the necessity to tune JIT compile parameters (such as the number of precompiled parser states) to optimize between initial delay and per-token overhead. Extremely large one-off grammars still incur a compile cost on first entry, though recurring substructures benefit from cross-grammar caching, reducing redundant work.
TagDispatch provides a separation of "mode detection" (via tags) and "mode enforcement" (via CFG-based constrained decoding), offering a scalable and efficient structured-generation paradigm for advanced agentic LLM applications (Li et al., 7 Jan 2026).