Papers
Topics
Authors
Recent
2000 character limit reached

TagDispatch: Dynamic LLM Dispatching

Updated 10 January 2026
  • TagDispatch is a dynamic dispatching framework that enables real-time switching between grammar fragments for structured generation in agentic LLMs.
  • It integrates an Aho–Corasick automaton for efficient tag detection with just-in-time compilation and cross-grammar caching to optimize decoding latency.
  • TagDispatch supports workflows like tool-calling and conditional channel switching, significantly reducing overhead compared to traditional CFG-based methods.

TagDispatch is a dynamic dispatching semantics for structured generation in agentic LLMs, designed to meet the challenges of modern workflows such as tool-calling, conditional channel switching, and dynamic JSON-schema generation. Unlike traditional constrained-decoding engines that rely on a statically defined context-free grammar (CFG), TagDispatch enables on-the-fly switching between sub-grammars, driven directly by the LLM's emitted tokens. The construct leverages the Aho–Corasick automaton for efficient tag detection and integrates just-in-time (JIT) compilation and cross-grammar caching to facilitate low-latency, scalable structured decoding in dynamic environments (Li et al., 7 Jan 2026).

1. Motivation and Context

TagDispatch addresses the limitations of classical constrained-decoding approaches in the context of agentic LLMs where structured generation is inherently dynamic. Traditional engines assume a single, statically known CFG and precompile all token-mask tables, rendering them unsuitable when the selection of grammars depends on LLM outputs—such as dynamically determined tool selections or runtime schema additions. TagDispatch is motivated by two core requirements: (a) real-time detection of the LLM's "mode" (i.e., which grammar branch is being taken), and (b) immediate, low-overhead installation of the corresponding grammar constraints without exhaustive pre-caching of all possible grammar combinations.

2. Formal Semantics of TagDispatch

A TagDispatch construct is formally defined as the tuple:

TD=(A,DispatchMap,S)TD = (A, \text{DispatchMap}, S)

where:

  • T={t1,...,tm}T = \{t_1, ..., t_m\} is the finite set of tags, with each tiΣ+t_i \in \Sigma^+;
  • G={Gi}i=1nG = \{G_i\}_{i=1}^n is the collection of CFGs or regular-grammar fragments;
  • S={s1,...,sk}S = \{s_1, ..., s_k\} is the set of stop-strings (signals that terminate dispatch);
  • AA is the Aho–Corasick automaton built over TST \cup S;
  • DispatchMap:TG\text{DispatchMap}: T \to G assigns each tag tit_i to its target grammar GiG_i.

At decode time, the mechanism maintains:

  • a mode bit m{"dispatching","dispatched"}m \in \{\text{"dispatching"}, \text{"dispatched"}\};
  • a current grammar GcurG_{\text{cur}} (valid only if m="dispatched"m = \text{"dispatched"});
  • an Aho–Corasick state aStates(A)a \in \text{States}(A).

The decoding logic is as follows. While in "dispatching" mode, each candidate token is fed through the AC update. Transitions are permitted only if they do not complete a prefix of both a tag and a stop-string. Completing a tag triggers a mode switch to "dispatched", associates the corresponding grammar, and resets the AC state; the mask generation then follows the conventional logic for CFG-constrained decoding (e.g., Earley parsing). Completing a stop-string prompts an exit from TagDispatch. Once "dispatched", mask generation adheres to GcurG_{\text{cur}}, reverting to "dispatching" upon completing GcurG_{\text{cur}}.

3. Algorithmic Integration

The mask generation for TagDispatch at each autoregressive time step is given by the following pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
state ← (m, a, G_cur, ParserState)
if m == “dispatching” then
  for each v ∈ Vocabulary do
    a′ ← AC.transition(a, v)
    if a′ completes any stop∈S then
      mask[v] ← 0; terminate loop entirely
    else if a′ completes tag tᵢ then
      mask[v] ← 1 for all v in tᵢ; m←“dispatched”; G_cur←Gᵢ; ParserState←init(Gᵢ);
    else
      mask[v] ← 1 (no grammar constraints yet)
  end
else  // m == “dispatched”
  (mask, ParserState′) ← EarleyMask(G_cur, ParserState)
  if ParserState′ indicates G_cur is done then
    m←“dispatching”; a←A.root
  end
end
return mask

The design avoids constructing a monolithic CFG interleaving all potential branches, and the tag detection (via AC) operates separately from the expensive grammar-masking logic. Grammars are JIT-compiled on entry, existing as "islands" only when invoked.

4. Complexity and Performance Analysis

TagDispatch is engineered for efficient per-token operations:

  • Tag matching via the Aho–Corasick automaton is O(v+1)O(|v| + 1) per candidate token, which amortizes to O(1)O(1) for fixed-length tags.
  • Earley mask generation per grammar state is worst-case O(n3)O(n^3) in input length nn, but typical per-token incremental updates cost O(n2)O(n^2) or O(n)O(n) for unambiguous or deterministic fragments.
  • The per-token cost: O(tag-check)+1dispatchedO(EarleyStep)O(\text{tag-check}) + 1_{\text{dispatched}} \cdot O(\text{EarleyStep}). In practice, AC cost is sub-microsecond, with the Earley step dominating the "dispatched" mode.
  • Deferred (JIT) grammar compilation amortizes install cost over multiple tokens, while cross-grammar caching leverages reuse across grammars.

5. Implementation Optimizations

Numerous optimizations are integrated into TagDispatch to ensure high-throughput decoding:

  • Just-In-Time Compilation: Grammar fragments GiG_i are compiled into FSM/Earley tables only upon entry, with a cache pool keyed by grammar hash for artifact reuse. Partial JIT precompiles the KK heaviest parser states during prompt processing to mask initial compile latency.
  • Cross-Grammar Caching: Shared sub-structures across grammars (e.g., JSON types) are hashed via stable BFS node numbering and non-commutative combination. When lookahead contexts differ, partial cache reuse is achieved by leveraging accepted/rejected sets and re-scanning only ambiguous tokens.

6. Empirical Evaluation

Empirical benchmarks demonstrate significant performance gains for TagDispatch over prior engines:

Metric TagDispatch (Contour+XGrammar 2) XGrammar (PDA-based) llguidance
End-to-end decoding latency (dynamic tool-calling) >6× reduction Baseline N/A
Per-token mask-generation overhead ~50 µs ~150 µs >1 ms
Grammar compilation time ~10 ms >1 s N/A
Function-calling throughput (relative) ~7× improvement Baseline N/A
LLM-inference latency (vs. unconstrained) within 6% Higher Higher

TagDispatch achieves near-zero overhead when not operating in a dispatched grammar and rapid on-demand installation when transitioning to new grammars (Li et al., 7 Jan 2026).

7. Use Cases, Benefits, and Trade-Offs

TagDispatch is most effective for workflows requiring dynamic, output-driven branching, including tool-calling APIs (function selection followed by structured arguments), conditional reasoning (special channels), and interleaved structured/free-text fields.

The principal advantages include:

  • Minimal decoding overhead in non-dispatched mode.
  • Low-latency, demand-driven grammar installation.
  • Extensive reuse of common fragments via cross-grammar caching.

Trade-offs encompass slightly increased decoding loop complexity and the necessity to tune JIT compile parameters (such as the number KK of precompiled parser states) to optimize between initial delay and per-token overhead. Extremely large one-off grammars still incur a compile cost on first entry, though recurring substructures benefit from cross-grammar caching, reducing redundant work.

TagDispatch provides a separation of "mode detection" (via tags) and "mode enforcement" (via CFG-based constrained decoding), offering a scalable and efficient structured-generation paradigm for advanced agentic LLM applications (Li et al., 7 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to TagDispatch.