Papers
Topics
Authors
Recent
Search
2000 character limit reached

GMTRouter: Personalized LLM Routing Framework

Updated 25 May 2026
  • GMTRouter is a personalized LLM routing framework that models multi-turn interactions as a heterogeneous graph comprising users, queries, responses, models, and turns.
  • The paper introduces a novel Heterogeneous Graph Transformer with tailored message-passing and relation-specific attention to enable effective few-shot user modeling without costly retraining.
  • Empirical evaluations show up to 21.6% accuracy gains and robust performance under cold-start conditions across diverse datasets, highlighting its practical adaptability.

GMTRouter is a personalized LLM routing framework that formulates multi-turn user-LLM interactions as a heterogeneous graph and employs a tailored message-passing mechanism for effective few-shot user modeling and routing. GMTRouter addresses the challenges of idiosyncratic user preferences, data sparsity, feedback noise, and the limitations of existing LLM routing solutions by generalizing to new users and evolving preferences without costly retraining or fine-tuning (Xie et al., 29 Oct 2025).

1. LLM Routing: Motivation, Formalization, and Challenges

LLM routing is the task of selecting, for each user query qq, an appropriate model mm from a candidate set M={m1,â€Ĥ,mn}\mathcal{M} = \{m_1, \dots, m_n\} with the goal of maximizing user satisfaction under computational constraints. In the personalized, multi-turn setting, a user's full interaction history is represented by

Hu={(q(t), r(t), f(t))}t=1Tu,\mathcal{H}_u = \bigl\{(q^{(t)},\, r^{(t)},\, f^{(t)})\bigr\}_{t=1}^{T_u},

where f(t)f^{(t)} is the user's structured or unstructured feedback after each LLM response r(t)r^{(t)}. The personalized routing problem is thus to learn

π(u,q)=argâĦmaxâĦm∈M  E[f(u,q,m)],\pi(u, q) = \arg\max_{m\in\mathcal{M}}\; \mathbb{E}[f(u, q, m)],

where π\pi is a routing function optimizing user-specific, query-conditioned satisfaction.

Empirical analyses show that inter-user Spearman correlations of LLM rankings are only 44–65% of intra-user consistency, confirming a high degree of user-level idiosyncrasy. Existing routing strategies, including FrugalGPT, C2MAB-V, and GraphRouter, do not capture this complexity: they either ignore longitudinal dialogue structure or cannot reliably encode sparse and noisy feedback into personal user profiles, especially under conditions of missing and inconsistent data.

2. Heterogeneous Graph Representation of Multi-turn Interactions

GMTRouter introduces a novel heterogeneous graph construction to preserve relational structure among users, LLMs, queries, responses, and interaction turns. Specifically, the graph is defined as

G=(V, E, X),\mathcal{G} = (\mathcal{V},\, \mathcal{E},\, \mathcal{X}),

with node set decomposition

V=U∪M∪Q∪R∪T,\mathcal{V} = \mathcal{U} \cup \mathcal{M} \cup \mathcal{Q} \cup \mathcal{R} \cup \mathcal{T},

where mm0 (users), mm1 (models), mm2 (queries), mm3 (responses), and mm4 (turns) are node types. Edges include user–turn, turn–model, turn–query, turn–response, and sequential turn–turn temporal links.

Feature matrices for nodes are as follows:

Node Type Feature Construction Dimension
User (mm5) Initialized as all zeros mm6
Model (mm7) PLM-embedded overview mm8
Query (mm9) PLM-encoded (M={m1,â€Ĥ,mn}\mathcal{M} = \{m_1, \dots, m_n\}0) M={m1,â€Ĥ,mn}\mathcal{M} = \{m_1, \dots, m_n\}1
Response (M={m1,â€Ĥ,mn}\mathcal{M} = \{m_1, \dots, m_n\}2) PLM-encoded (M={m1,â€Ĥ,mn}\mathcal{M} = \{m_1, \dots, m_n\}3) M={m1,â€Ĥ,mn}\mathcal{M} = \{m_1, \dots, m_n\}4
Turn (M={m1,â€Ĥ,mn}\mathcal{M} = \{m_1, \dots, m_n\}5) Initialized as all zeros M={m1,â€Ĥ,mn}\mathcal{M} = \{m_1, \dots, m_n\}6

User feedback M={m1,â€Ĥ,mn}\mathcal{M} = \{m_1, \dots, m_n\}7 is discretized and projected to M={m1,â€Ĥ,mn}\mathcal{M} = \{m_1, \dots, m_n\}8, a scalar feature attached to response nodes. The adjacency structure can be encoded either as relation-specific adjacency tensors or as neighbor-type lists M={m1,â€Ĥ,mn}\mathcal{M} = \{m_1, \dots, m_n\}9, reflecting edge heterogeneity and temporal ordering.

3. Message Passing with a Heterogeneous Graph Transformer

GMTRouter leverages a Heterogeneous Graph Transformer (HGT) to propagate information across typed nodes and edges. For any node Hu={(q(t), r(t), f(t))}t=1Tu,\mathcal{H}_u = \bigl\{(q^{(t)},\, r^{(t)},\, f^{(t)})\bigr\}_{t=1}^{T_u},0 at layer Hu={(q(t), r(t), f(t))}t=1Tu,\mathcal{H}_u = \bigl\{(q^{(t)},\, r^{(t)},\, f^{(t)})\bigr\}_{t=1}^{T_u},1, the update rule is: Hu={(q(t), r(t), f(t))}t=1Tu,\mathcal{H}_u = \bigl\{(q^{(t)},\, r^{(t)},\, f^{(t)})\bigr\}_{t=1}^{T_u},2 with

Hu={(q(t), r(t), f(t))}t=1Tu,\mathcal{H}_u = \bigl\{(q^{(t)},\, r^{(t)},\, f^{(t)})\bigr\}_{t=1}^{T_u},3

The relation-specific attention score for edge type Hu={(q(t), r(t), f(t))}t=1Tu,\mathcal{H}_u = \bigl\{(q^{(t)},\, r^{(t)},\, f^{(t)})\bigr\}_{t=1}^{T_u},4 is

Hu={(q(t), r(t), f(t))}t=1Tu,\mathcal{H}_u = \bigl\{(q^{(t)},\, r^{(t)},\, f^{(t)})\bigr\}_{t=1}^{T_u},5

This architecture facilitates the flow of feedback Hu={(q(t), r(t), f(t))}t=1Tu,\mathcal{H}_u = \bigl\{(q^{(t)},\, r^{(t)},\, f^{(t)})\bigr\}_{t=1}^{T_u},6 from response nodes to user embeddings through turns and incorporates different edge semantics via dedicated weights Hu={(q(t), r(t), f(t))}t=1Tu,\mathcal{H}_u = \bigl\{(q^{(t)},\, r^{(t)},\, f^{(t)})\bigr\}_{t=1}^{T_u},7. This framework allows effective integration of sparse feedback and captures nuanced, multi-relational dependencies needed for robust user modeling.

4. Inductive Few-shot Graph Learning and Preference Prediction

GMTRouter eschews full-graph transductive training in favor of inductive, subgraph-sampling learning. For each user Hu={(q(t), r(t), f(t))}t=1Tu,\mathcal{H}_u = \bigl\{(q^{(t)},\, r^{(t)},\, f^{(t)})\bigr\}_{t=1}^{T_u},8 and epoch, a subgraph Hu={(q(t), r(t), f(t))}t=1Tu,\mathcal{H}_u = \bigl\{(q^{(t)},\, r^{(t)},\, f^{(t)})\bigr\}_{t=1}^{T_u},9 covering f(t)f^{(t)}0 recent turns is sampled. A small supervision set of tuples f(t)f^{(t)}1 is held out.

After applying f(t)f^{(t)}2 HGT layers, preference scores are computed via a cross-attention prediction head: f(t)f^{(t)}3 A ranking-oriented entropy loss is used: f(t)f^{(t)}4 where f(t)f^{(t)}5 is the normalized ground-truth rating, and only GNN and prediction head parameters are updated—node features from PLM encodings remain fixed. This approach yields effective generalization from few-shot supervision and is robust to feedback sparsity and noise.

5. Adaptation to New Users and Evolving Preferences

GMTRouter supports rapid adaptation to previously unseen users or users whose preferences change. For a new user f(t)f^{(t)}6 with f(t)f^{(t)}7 past turns, the framework builds a minimal f(t)f^{(t)}8, initializes f(t)f^{(t)}9, and, after encoding new queries, responses, and models, infers r(t)r^{(t)}0 via the HGT. For routing, it selects the model maximizing the predicted user preference score: r(t)r^{(t)}1 No gradient updates or parameter fine-tuning are necessary at inference, enabling efficient deployment in dynamic, user-facing scenarios. As preferences evolve, inference can be re-applied to newly appended turns in the user's ongoing graph, facilitating continual user modeling.

6. Experimental Evaluation: Datasets, Baselines, and Results

GMTRouter's empirical efficacy is established via experiments on four datasets: Chatbot-Arena (11 users, 16 LLMs, human pairwise preferences), MT-Bench (10 users, 2 LLMs, multi-turn reasoning), GSM8K (10 users, 2 LLMs, math problems), and MMLU (5 users, 2 LLMs, multi-domain queries). Evaluation metrics include top-1 model accuracy and AUC-ROC.

Comparative baselines:

Router Personalization Graph Structure Cost Awareness
Vanilla LLM No No No
Personalized LLM Yes No No
GraphRouter No Yes No
FrugalGPT No No Yes

On average splits, GMTRouter achieves 0.9–21.6 percentage-point gains in accuracy and 0.006–0.309 higher AUC than any baseline. Performance drop under 30% cold-start users is less than 1%. Ablations show that removing explicit preference features, the tailored prediction head, heterogeneous message passing, or user nodes each degrades accuracy by 2–20 percentage points.

7. Contributions, Insights, and Practical Considerations

The central contribution of GMTRouter is the formalization of personalized, multi-turn LLM routing as a node classification and ranking task on a user–turn–query–response–model heterogeneous graph reinforced with a relation-aware transformer architecture. The tailored HGT message-passing framework enables explicit propagation of sparse and noisy feedback across relational structures, supported by inductive, few-shot subgraph-based learning. This enables generalization to new users and evolving preferences without retraining.

Empirical results indicate that GMTRouter substantially outperforms both non-personalized and prompt-based approaches, offering improvements of up to 21.6% in accuracy and 0.309 in AUC. The model is lightweight (approximately 27M parameters, 109 MB, requiring 4.3 GB GPU memory) and hence suitable for production deployment. Potential limitations include dependency on accurate entity extraction and pre-trained LLMs for node initialization. Future directions include dynamic graph updates, richer feedback types, and integration with retrieval-augmented generation (Xie et al., 29 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GMTRouter.