GMTRouter: Personalized LLM Routing Framework
- GMTRouter is a personalized LLM routing framework that models multi-turn interactions as a heterogeneous graph comprising users, queries, responses, models, and turns.
- The paper introduces a novel Heterogeneous Graph Transformer with tailored message-passing and relation-specific attention to enable effective few-shot user modeling without costly retraining.
- Empirical evaluations show up to 21.6% accuracy gains and robust performance under cold-start conditions across diverse datasets, highlighting its practical adaptability.
GMTRouter is a personalized LLM routing framework that formulates multi-turn user-LLM interactions as a heterogeneous graph and employs a tailored message-passing mechanism for effective few-shot user modeling and routing. GMTRouter addresses the challenges of idiosyncratic user preferences, data sparsity, feedback noise, and the limitations of existing LLM routing solutions by generalizing to new users and evolving preferences without costly retraining or fine-tuning (Xie et al., 29 Oct 2025).
1. LLM Routing: Motivation, Formalization, and Challenges
LLM routing is the task of selecting, for each user query , an appropriate model from a candidate set with the goal of maximizing user satisfaction under computational constraints. In the personalized, multi-turn setting, a user's full interaction history is represented by
where is the user's structured or unstructured feedback after each LLM response . The personalized routing problem is thus to learn
where is a routing function optimizing user-specific, query-conditioned satisfaction.
Empirical analyses show that inter-user Spearman correlations of LLM rankings are only 44â65% of intra-user consistency, confirming a high degree of user-level idiosyncrasy. Existing routing strategies, including FrugalGPT, C2MAB-V, and GraphRouter, do not capture this complexity: they either ignore longitudinal dialogue structure or cannot reliably encode sparse and noisy feedback into personal user profiles, especially under conditions of missing and inconsistent data.
2. Heterogeneous Graph Representation of Multi-turn Interactions
GMTRouter introduces a novel heterogeneous graph construction to preserve relational structure among users, LLMs, queries, responses, and interaction turns. Specifically, the graph is defined as
with node set decomposition
where 0 (users), 1 (models), 2 (queries), 3 (responses), and 4 (turns) are node types. Edges include userâturn, turnâmodel, turnâquery, turnâresponse, and sequential turnâturn temporal links.
Feature matrices for nodes are as follows:
| Node Type | Feature Construction | Dimension |
|---|---|---|
| User (5) | Initialized as all zeros | 6 |
| Model (7) | PLM-embedded overview | 8 |
| Query (9) | PLM-encoded (0) | 1 |
| Response (2) | PLM-encoded (3) | 4 |
| Turn (5) | Initialized as all zeros | 6 |
User feedback 7 is discretized and projected to 8, a scalar feature attached to response nodes. The adjacency structure can be encoded either as relation-specific adjacency tensors or as neighbor-type lists 9, reflecting edge heterogeneity and temporal ordering.
3. Message Passing with a Heterogeneous Graph Transformer
GMTRouter leverages a Heterogeneous Graph Transformer (HGT) to propagate information across typed nodes and edges. For any node 0 at layer 1, the update rule is: 2 with
3
The relation-specific attention score for edge type 4 is
5
This architecture facilitates the flow of feedback 6 from response nodes to user embeddings through turns and incorporates different edge semantics via dedicated weights 7. This framework allows effective integration of sparse feedback and captures nuanced, multi-relational dependencies needed for robust user modeling.
4. Inductive Few-shot Graph Learning and Preference Prediction
GMTRouter eschews full-graph transductive training in favor of inductive, subgraph-sampling learning. For each user 8 and epoch, a subgraph 9 covering 0 recent turns is sampled. A small supervision set of tuples 1 is held out.
After applying 2 HGT layers, preference scores are computed via a cross-attention prediction head: 3 A ranking-oriented entropy loss is used: 4 where 5 is the normalized ground-truth rating, and only GNN and prediction head parameters are updatedânode features from PLM encodings remain fixed. This approach yields effective generalization from few-shot supervision and is robust to feedback sparsity and noise.
5. Adaptation to New Users and Evolving Preferences
GMTRouter supports rapid adaptation to previously unseen users or users whose preferences change. For a new user 6 with 7 past turns, the framework builds a minimal 8, initializes 9, and, after encoding new queries, responses, and models, infers 0 via the HGT. For routing, it selects the model maximizing the predicted user preference score: 1 No gradient updates or parameter fine-tuning are necessary at inference, enabling efficient deployment in dynamic, user-facing scenarios. As preferences evolve, inference can be re-applied to newly appended turns in the user's ongoing graph, facilitating continual user modeling.
6. Experimental Evaluation: Datasets, Baselines, and Results
GMTRouter's empirical efficacy is established via experiments on four datasets: Chatbot-Arena (11 users, 16 LLMs, human pairwise preferences), MT-Bench (10 users, 2 LLMs, multi-turn reasoning), GSM8K (10 users, 2 LLMs, math problems), and MMLU (5 users, 2 LLMs, multi-domain queries). Evaluation metrics include top-1 model accuracy and AUC-ROC.
Comparative baselines:
| Router | Personalization | Graph Structure | Cost Awareness |
|---|---|---|---|
| Vanilla LLM | No | No | No |
| Personalized LLM | Yes | No | No |
| GraphRouter | No | Yes | No |
| FrugalGPT | No | No | Yes |
On average splits, GMTRouter achieves 0.9â21.6 percentage-point gains in accuracy and 0.006â0.309 higher AUC than any baseline. Performance drop under 30% cold-start users is less than 1%. Ablations show that removing explicit preference features, the tailored prediction head, heterogeneous message passing, or user nodes each degrades accuracy by 2â20 percentage points.
7. Contributions, Insights, and Practical Considerations
The central contribution of GMTRouter is the formalization of personalized, multi-turn LLM routing as a node classification and ranking task on a userâturnâqueryâresponseâmodel heterogeneous graph reinforced with a relation-aware transformer architecture. The tailored HGT message-passing framework enables explicit propagation of sparse and noisy feedback across relational structures, supported by inductive, few-shot subgraph-based learning. This enables generalization to new users and evolving preferences without retraining.
Empirical results indicate that GMTRouter substantially outperforms both non-personalized and prompt-based approaches, offering improvements of up to 21.6% in accuracy and 0.309 in AUC. The model is lightweight (approximately 27M parameters, 109 MB, requiring 4.3 GB GPU memory) and hence suitable for production deployment. Potential limitations include dependency on accurate entity extraction and pre-trained LLMs for node initialization. Future directions include dynamic graph updates, richer feedback types, and integration with retrieval-augmented generation (Xie et al., 29 Oct 2025).