Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Arch-Router: Aligning LLM Routing with Human Preferences (2506.16655v1)

Published 19 Jun 2025 in cs.CL

Abstract: With the rapid proliferation of LLMs -- each optimized for different strengths, style, or latency/cost profile -- routing has become an essential technique to operationalize the use of different models. However, existing LLM routing approaches are limited in two key ways: they evaluate performance using benchmarks that often fail to capture human preferences driven by subjective evaluation criteria, and they typically select from a limited pool of models. In this work, we propose a preference-aligned routing framework that guides model selection by matching queries to user-defined domains (e.g., travel) or action types (e.g., image editing) -- offering a practical mechanism to encode preferences in routing decisions. Specifically, we introduce \textbf{Arch-Router}, a compact 1.5B model that learns to map queries to domain-action preferences for model routing decisions. Our approach also supports seamlessly adding new models for routing without requiring retraining or architectural modifications. Experiments on conversational datasets demonstrate that our approach achieves state-of-the-art (SOTA) results in matching queries with human preferences, outperforming top proprietary models. Our approach captures subjective evaluation criteria and makes routing decisions more transparent and flexible. Our model is available at: \texttt{https://huggingface.co/katanemo/Arch-Router-1.5B}.

Summary

  • The paper introduces a novel framework that decouples route selection from model assignment using a Domain–Action taxonomy to align LLM routing with human preferences.
  • The paper achieves 93.17% overall routing accuracy and 98.11% turn accuracy while reducing latency to 51ms, outperforming proprietary baselines.
  • The paper demonstrates a modular design that enables policy updates without retraining, ensuring maintainability and transparent human-in-the-loop optimization.

Preference-Aligned Routing with Arch-Router: A Practical Framework for Human-Centric LLM Orchestration

"Arch-Router: Aligning LLM Routing with Human Preferences" (2506.16655) addresses the increasing complexity of deploying multiple LLMs to serve diverse user needs. As organizations transition from monolithic to multi-model LLM systems—driven by distinct strengths in cost, latency, style, and task—routing mechanisms have become essential. Traditional routing mechanisms, however, are typically optimized for aggregate benchmark performance and lack adaptability to users’ subjective, context-specific preferences, posing practical barriers to adoption in real-world settings.

This work introduces Arch-Router, a 1.5B parameter generative model that enables explicit alignment of LLM routing with human preferences. The core innovation is a framework that decouples route selection (deciding which “policy” best matches a query) from model assignment (deciding which LLM processes that policy), centered around a user-defined Domain–Action taxonomy. This approach enables organizations to encode evolving, fine-grained preferences directly into routing policies, which can be updated or augmented at inference time without necessitating retraining or architectural change.

Domain–Action Taxonomy and Routing Mechanism

The Domain–Action taxonomy organizes route policies as semantically meaningful tuples (domain, action)—e.g., (finance, code generation) or (healthcare, summarization). Each route policy is defined in natural language and serves as an anchor for associating LLMs with specific user intents. The routing process has two stages:

  1. Policy Selection: Given a user query and the set of available policies, Arch-Router generates the identifier of the most appropriate route policy using a prompt that explicitly enumerates all current policies. Unlike classifier-based routers with fixed output spaces, the generative approach enables seamless onboarding of new (policy, LLM) pairs at runtime.
  2. Model Assignment: A mapping table assigns each policy to a preferred LLM. This decoupling confers both modularity and transparency: policies—articulated by non-technical users if needed—can be iteratively refined, and LLMs swapped or added without impact on the learned routing logic. This property is particularly consequential for enterprises responding to regulatory, performance, or ethical requirements.

Data Creation Pipeline

The methodology incorporates a two-phase data generation framework to produce robust training and evaluation data:

  • Phase 1 (Foundation): Clean, labeled conversations and curated policy descriptions are automatically synthesized and validated using LLMs. Diverse, realistic intents are captured by scoping data generation to real-world domains and use cases.
  • Phase 2 (Augmentation): Dataset robustness is increased via noise injection, policy perturbation, and scenario mixing, ensuring the model learns to disambiguate user intent in ambiguous, noisy conditions and across multi-turn dialogue.

This pipeline produces training data reflecting the types of ambiguity, topic drift, and policy collisions encountered in operational deployments—crucial for model generalizability beyond benchmark datasets.

Experimental Evaluation

Comprehensive experiments are conducted on four public conversational datasets (CLINC-150, MANtIS, SGD, LMSYS-1M), each adapted to include annotated policies per the proposed taxonomy. Three levels of accuracy are measured: per-turn, span (contiguous route consistency), and conversation (exact match throughout all turns).

Numerical highlights:

  • Arch-Router achieves an overall routing accuracy of 93.17%, a margin of 7.71% over the best proprietary baselines.
  • On fine-grained action-policy matches (fQfA), Arch-Router records 98.11% turn accuracy.
  • Multi-turn and span-level accuracy are superior, confirming strong context tracking.
  • Latency is orders of magnitude lower than proprietary APIs (51ms vs. >500ms), making deployment viable for latency-sensitive applications.
  • The model is robust to irrelevance and ambiguous queries, closely matching or outperforming major proprietary models on these dimensions.

Practical implications of these results are demonstrated via a coding session case paper. Arch-Router correctly routes all user turns, dynamically interpreting ambiguous follow-ups and context-dependent requests (e.g., error diagnosis or performance optimization, even when those intents are referred to obliquely by the user).

Comparative Analysis and Trade-Offs

The authors provide a rigorous analysis contrasting preference-aligned and performance-based routing:

  • Performance-based routing leverages automated metrics to maximize output quality under cost or latency constraints. This approach, however, is brittle when task boundaries blur or subjective factors dominate the definition of “good” performance.
  • Preference-aligned routing—as implemented by Arch-Router—prioritizes human-specified policies over model-predicted metrics, supporting scenarios where subjective evaluation criteria (tone, style, ethics) outweigh aggregate benchmark results.

Limitations include:

  • Routing accuracy is dependent on the design and clarity of the user-defined policy set; semantic overlap or underspecification can lead to ambiguity.
  • The ultimate efficacy of the system is bounded by the user’s competence in assigning models to policies.

Practical Implications and Future Directions

Arch-Router’s architecture demonstrates practical advances for both researchers and practitioners:

  • System Maintainability: The off-line definition of policies and the ability to update model assignments without retraining support continuous deployment in dynamic environments.
  • Human-in-the-loop Optimization: The transparent, auditable mechanism allows iterative refinement, bridging the gap between technical and non-technical stakeholders.
  • Resource Efficiency: The small model size and high accuracy at low latency contribute to significant cost and energy savings, removing a critical barrier to adoption in resource-constrained settings.

Potential directions for future work include hybrid frameworks blending subjective policy alignment with objective performance metrics; methods for improved policy set generation and refinement; and larger-scale studies in highly dynamic, multi-tenant LLM environments.

In conclusion, the Arch-Router framework and associated methodology present a viable, tested path for operationalizing preference-aligned, interpretable, and high-throughput LLM routing in practical AI systems.

Youtube Logo Streamline Icon: https://streamlinehq.com