Papers
Topics
Authors
Recent
Search
2000 character limit reached

TagRouter: Efficient LLM Routing

Updated 22 May 2026
  • TagRouter is a training-free routing framework that orchestrates multiple large language models using semantic tag representations for enhanced open-domain text generation.
  • It leverages lightweight tag generation, scoring, and decision modules to dynamically assign queries based on model performance and cost efficiency.
  • Empirical results show TagRouter achieves over 6 percentage points gain in accept rate and up to 17% cost reduction compared to baseline methods.

TagRouter is a training-free model routing framework that orchestrates multiple LLMs through learned tag representations for efficient open-domain text generation. It seeks to maximize system-level response quality and minimize cost by dynamically allocating each user query to the most appropriate LLM from a heterogeneous pool, using a lightweight, semantic tagging mechanism. TagRouter empirically outperforms 13 baselines, achieving higher accept rates and substantial cost reductions, and supports practical ensembling of diverse, potentially black-box, LLMs in a unified, scalable architecture (Chen et al., 14 Jun 2025).

1. Open-Domain Model Routing: Problem and Objectives

Consider a pool of KK LLMs, M={M1,...,MK}\mathcal{M} = \{M_1, ..., M_K\}, where each MiM_i possesses a known per-token cost cic_i and distinct response strengths. Given a batch of user queries Q={q1,...,qN}Q = \{q_1, ..., q_N\}, the routing objective is to construct a policy g:QMg: Q \to \mathcal{M} that assigns each query qq to a model M(q)M^*(q), such that system-wide accuracy and resource cost are jointly optimized.

Accuracy is measured via accept rate (AR), defined by the fraction of model responses that achieve “win” or “tie” labels, determined through pairwise judgments against a high-quality reference LLM. Let outcome(M,q){win,tie,loss}\mathrm{outcome}(M, q) \in \{\mathrm{win}, \mathrm{tie}, \mathrm{loss}\}, and the indicator Iwintie(M,q)=1I_{\mathrm{win} \cup \mathrm{tie}}(M, q) = 1 if the result is “win” or “tie.” The core metrics are formalized as:

  • Accept Rate (AR):

M={M1,...,MK}\mathcal{M} = \{M_1, ..., M_K\}0

  • Relative Cost:

M={M1,...,MK}\mathcal{M} = \{M_1, ..., M_K\}1

The target is high M={M1,...,MK}\mathcal{M} = \{M_1, ..., M_K\}2 with low M={M1,...,MK}\mathcal{M} = \{M_1, ..., M_K\}3.

2. Tag Construction and Semantic Representation

TagRouter maps each input query M={M1,...,MK}\mathcal{M} = \{M_1, ..., M_K\}4 to a compact set of semantic tags, M={M1,...,MK}\mathcal{M} = \{M_1, ..., M_K\}5, selected from a normalized vocabulary M={M1,...,MK}\mathcal{M} = \{M_1, ..., M_K\}6. Tag construction involves several processing stages:

  • Raw Tag Generation: For each M={M1,...,MK}\mathcal{M} = \{M_1, ..., M_K\}7, a strong LLM (ERNIE-4.0-Turbo) generates coarse-grained tags via a system instruction, accumulating M={M1,...,MK}\mathcal{M} = \{M_1, ..., M_K\}814,000 unique raw tags in the BCUQ corpus.
  • Normalization: Tags occurring fewer than five times are filtered. Aggregation removes punctuation and normalizes casing. Tag semantics are further clustered: tags are embedded using a PhraseBERT encoder M={M1,...,MK}\mathcal{M} = \{M_1, ..., M_K\}9; DBSCAN clusters embeddings, and nearest pairs are merged within clusters until each contains at least two exemplars (per Algorithm 1 (Chen et al., 14 Jun 2025)).
  • Mathematical Representation: The resulting vocabulary MiM_i0 (size 1,601 for BCUQ). For a query MiM_i1, MiM_i2 of size MiM_i3 is produced by the TagGenerator model. At routing time, tags may be realigned to nearest neighbors in MiM_i4 using cosine similarity:

MiM_i5

3. Inference Pipeline and Routing Algorithm

The TagRouter algorithm consists of three key modules:

3.1 TAGGENERATOR

Maps each query MiM_i6 to a tag set MiM_i7 using the trained tag generator model.

3.2 TAGSCORER

Associates each model MiM_i8 and tag MiM_i9 with a precomputed performance score:

cic_i0

where:

  • cic_i1: counts of “win,” “tie,” or “loss” for cic_i2 on queries with tag cic_i3
  • cic_i4: weights (default: cic_i5)
  • cic_i6: emphasizes rare but reliable tags

For a query, cic_i7.

3.3 TAGDECIDER

Selects the model cic_i8. For two-model cost-aware routing, the differential:

cic_i9

is compared to threshold Q={q1,...,qN}Q = \{q_1, ..., q_N\}0—adjusting Q={q1,...,qN}Q = \{q_1, ..., q_N\}1 enables continuous AR vs. cost trade-off.

Pseudocode Outline:

qq4

4. Theoretical Properties and Cost-Performance Guarantees

TagRouter's empirical nature is complemented by formal characterizations:

  • Accept Rate at Threshold Q={q1,...,qN}Q = \{q_1, ..., q_N\}2:

Q={q1,...,qN}Q = \{q_1, ..., q_N\}3

  • AUC/PAUC: Treating the probability Q={q1,...,qN}Q = \{q_1, ..., q_N\}4 that Q={q1,...,qN}Q = \{q_1, ..., q_N\}5 selects an expensive LLM as Q={q1,...,qN}Q = \{q_1, ..., q_N\}6-axis and Q={q1,...,qN}Q = \{q_1, ..., q_N\}7 as Q={q1,...,qN}Q = \{q_1, ..., q_N\}8-axis:

Q={q1,...,qN}Q = \{q_1, ..., q_N\}9

Partial AUC above always-LLM baseline g:QMg: Q \to \mathcal{M}0:

g:QMg: Q \to \mathcal{M}1

  • Efficiency Guarantee: If g:QMg: Q \to \mathcal{M}2 is an unbiased predictor of g:QMg: Q \to \mathcal{M}3, then:

g:QMg: Q \to \mathcal{M}4

with empirically observed g:QMg: Q \to \mathcal{M}5 (17% cost reduction) and negligible g:QMg: Q \to \mathcal{M}6.

5. Empirical Evaluation and Benchmark Results

5.1 Datasets and Model Pool

  • Models: ERNIE-3.5 (“reference” LLM), EBspeed (cheaper), GLM4-9B, Qwen2.5-7B, EBspeedX
  • Benchmarks: BCUQ (95,559 real-user queries, 8 categories), Alpaca (51,014 synthetic), Dolly (14,013 crowd-sourced)

5.2 Baselines

Comparison baselines include single-model (EB3.5, EBspeed), routing-after-inference (FrugalGPT, PairRanker, Blending), routing-before-inference (RouteLLM-BERT, RouteLLM-SWR, RouteLLM-MF, RouterBench-KNN/MLP, FORC), and tag-based variants.

5.3 Metrics

Evaluated on AR (%), g:QMg: Q \to \mathcal{M}7AR relative to EB3.5, Relative Cost (EB3.5=1.0), AUC, PAUC, and GPT-Rank.

5.4 BCUQ Results

Method AR (%) ΔAR Cost AUC (%) PAUC (%)
ERNIE-3.5 78.76 1.400
FrugalGPT 78.88 +0.15 1.324 70.11 0.01
PairRanker 78.76 +0.00 1.212 72.17 0.00
RouteLLM-MF 80.34 +2.01 1.197 73.94 0.12
RouterBench-KNN 80.45 +2.15 1.196 75.15 0.40
FORC 81.80 +3.86 1.182 75.73 0.76
Best Tag-based 82.02 +4.14 1.180 76.08 0.76
TagRouter 83.60 +6.15 1.164 76.10 1.46

TagRouter achieves a +6.15 percentage point gain in AR with a 17.2% cost reduction relative to the single-LLM baseline, and highest AUC and PAUC among all methods. It demonstrates superior AUC in 7 of 8 BCUQ categories and scales robustly with increased model pool size: AUC grows from 0.7610 (2 candidates) to 0.8043 (5 candidates) while holding AR effectively constant. Routing between similarly sized models yields +6 pp AR at −14% cost.

6. Practical Considerations, Scalability, and Limitations

  • Scalability and Evolution: TagRouter is intrinsically training-free. TAGSCORER and TAGDECIDER rely on lookup tables and threshold rules, not gradient-based optimization. When a new LLM (g:QMg: Q \to \mathcal{M}8) is added, it suffices to annotate g:QMg: Q \to \mathcal{M}91,000 sample queries for routing score estimation; TagRouter can immediately incorporate qq0 with no retraining.
  • Latency and Resource Requirements: TAGGENERATOR comprises a 0.5B-parameter (500MB) LLM and a 33MB embedding model. Its non-repetitive inference minimizes latency.
  • Cost Control: The scalar threshold qq1 provides fine-grained AR vs. cost trade-off with a practical default at qq2.
  • Comparison to Prior Methods: Training-based routers (RouteLLM-MF/BERT, RouterBench) require full retraining for candidate pool changes. Speculative/iterative methods (FrugalGPT, FORC) induce higher latency and redundant queries. TagRouter uniquely supports proprietary or black-box LLMs, is agnostic to model counts, and handles open-domain prompts.
  • Limitations and Prospects: Language support (TagGenerator trained on English/Chinese) is currently limited but extensible. Large-scale evaluation leverages LLM-as-judge with high agreement (Cohen’s qq3 with EB4.0); further formal regret or PAC-style routing analyses remain open for future research.

7. Summary

TagRouter introduces a semantic tagging paradigm for ensemble LLM routing, enabling a dynamically extensible “super model” that adapts to the evolving LLM ecosystem. State-of-the-art accept rates (+6.15 pp) and substantial cost savings (−17.20%) are achieved without requiring per-candidate retraining, supporting deployment for cost-sensitive, open-domain text generation in practical real-world systems (Chen et al., 14 Jun 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TagRouter.