Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 174 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Arch-Router: Preference-Aligned LLM Routing

Updated 22 October 2025
  • Arch-Router is an advanced LLM routing framework that aligns human preferences with domain-action policies to optimize model selection for diverse conversational tasks.
  • The architecture employs a compact 1.5B parameter model that jointly processes user queries, conversation context, and XML-formatted policy descriptors for flexible, modular routing.
  • Performance evaluation shows Arch-Router achieves 93.17% accuracy with 51 ms latency, outperforming commercial baselines and supporting real-time deployment.

Arch-Router refers to a family of advanced model-routing frameworks designed to orchestrate LLMs according to user preferences, domain intent, and subjective evaluation criteria rather than static benchmarks or performance proxy metrics. The Arch-Router’s distinguishing innovation is the alignment of routing decisions with human preferences—via a taxonomy of domains and actions—allowing flexible, interpretable, and high-accuracy selection among multiple specialized models. The core implementation, Arch-Router-1.5B, is a compact generative model that processes user queries and conversational history alongside natural language route policies to select the optimal route. This decoupling of route policy from model mapping introduces architectural flexibility and supports the seamless integration of new models without retraining. Arch-Router achieves state-of-the-art accuracy and low latency for LLM routing, outperforming commercial and proprietary baselines in conversational settings (Tran et al., 19 Jun 2025).

1. Preference-Aligned LLM Routing: Motivation and Principles

Conventional LLM routers focus on static, task-driven selection or optimization for measurable metrics (e.g., accuracy on fixed datasets). However, such approaches often fail to reflect the nuanced, subjective criteria that drive human acceptance of model outputs—such as style, appropriateness, domain expertise, or user intent. Arch-Router addresses this gap by:

  • Utilizing a Domain–Action taxonomy, where each route policy is described in natural language, capturing both the task domain (e.g., travel, finance) and the intended action (e.g., summarization, code generation).
  • Allowing end-users or system administrators to define routing criteria, rendering the system adaptable to dynamic, task-specific, or subjective preferences.

This approach facilitates the encoding of flexible, user-aligned policies and supports transparent, interpretable decision making for LLM routing.

2. Model Architecture and Technical Implementation

Arch-Router is instantiated as a 1.5B parameter decoder-style LLM that jointly considers the user’s query, dialog context, and a set of route policy descriptions, each provided in natural language. The core architectural properties include:

  • Input prompt containing: (i) the current query, (ii) full conversation context, and (iii) a list of policy descriptions with unique route identifiers, formatted in XML for deterministic parsing.
  • The generative head produces exactly the matching route name, enforced via cross-entropy loss over the target route identifier.
  • The architecture processes all prompts as raw text, eschewing separate embedding or classification heads.
  • Routing is decoupled: the model selects a route identifier (policy) as output, and a table-based mapping assigns this policy to a specific LLM endpoint; the mapping layer is easily updated for model pool changes, requiring no model retraining.

This approach ensures that the routing framework is both modular and extensible.

3. Training Objective and Prompt Engineering

Arch-Router is trained using supervised fine-tuning (SFT) via cross-entropy, formalized as:

minθ L(FArch(x), ctrue)\min_\theta\ L(\mathcal{F}_{Arch}(x),\ c_{true})

where FArch\mathcal{F}_{Arch} is the router model, xx is the prompt (comprising query, context, and policy set), and ctruec_{true} is the ground-truth route label. The prompt format (Table 1 in (Tran et al., 19 Jun 2025)) uses explicit XML wrappers to encode the candidate policies and dialog turns, enabling deterministic extraction of the model’s prediction.

The prompt template ensures strict adherence to output structure—only the target route name is generated. This structure, in conjunction with the inclusion of full conversation history, allows for context-aware, interpretable selection. The model is able to handle policies at varying granularity (domain-only, domain-action, or even highly specialized heuristics).

4. Performance Evaluation and Comparative Metrics

Arch-Router’s performance was assessed on diverse intent-classification and conversation-routing datasets, including CLINC150, MANtIS, SGD, and LMSYS-1M. Measurements were performed at turn-level, span-level, and full conversation level, reflecting the framework’s accuracy in both atomic and multi-turn contexts. Key results are:

  • Overall routing accuracy of 93.17%.
  • Outperformance of commercial models (e.g., GPT-4, Claude) by an average of 7.71% margin.
  • Superior performance in handling both fine-grained and coarse domain routing, as well as irrelevant queries.
  • Mean response latency of ~51 ms (±12 ms), substantially lower than established baselines, supporting real-time deployment scenarios.

Table: Comparative Turn-Level Routing Accuracy

Method Accuracy (%) Latency (ms)
Arch-Router 93.17 51 ± 12
GPT-4 (prop.) ≤85.5 >250
Claude (prop.) ≤85.5 >150

All numbers from (Tran et al., 19 Jun 2025).

5. Preference Encoding: Domain–Action Taxonomy and Policy Management

Arch-Router employs user-configurable route policies, each described by a Domain and an Action in natural language. This structure supports:

  • Fine-grained matching between queries and specialist models (e.g., coding, bug fixing, API help).
  • Coarse routing when only domain is identified, facilitating flexible fallbacks.
  • Transparent, user-auditable decision-making: route and rationale can be audited as part of the system logs.

Policy management is further enhanced by decoupling the router model from the route-to-model mapping, allowing system designers to introduce, modify, or deprecate policies and target models dynamically.

6. Applications and Deployment Scenarios

Arch-Router is suitable for any compositional, multi-model environment where aligning LLM selection to nuanced user requirements, task types, or subjective preferences is paramount. Representative use cases include:

  • Conversational AI systems: Routing general dialog, technical questions, or creative tasks to domain-optimized models.
  • Workflow systems: Selecting between models specialized in summarization, code synthesis, translation, etc.
  • Knowledge workers: Multi-turn coding assistance, with policies for bug fixing, optimization, and documentation retrieval.
  • Real-time intent-based API selection: Routing requests to image, audio, or non-LLM models based on parsed action intent.
  • Customer support: Aligning agent responses with product, region, or regulatory requirements without retraining the routing core.

Arch-Router’s design supports augmentation of policy sets and model pools on-the-fly, with no retraining cost.

7. Future Directions and Research Prospects

Potential directions highlighted for Arch-Router include:

  • Hybrid frameworks combining subjective preference-aligned routing with quantitative performance metrics for scenarios where both human and objective criteria should contribute to model selection.
  • Extension of user policy descriptions to richer taxonomies, supporting nuanced intent, tone, and ethical/region-specific constraints.
  • Automated policy onboarding for rapid expansion of supported tasks and action types using only natural language descriptions, facilitating organic scaling.
  • Further reduction in latency and computational cost for mobile or embedded deployments.
  • Expansion to support multi-modal and non-LLM models by extending the input policy language.

These directions reflect ongoing research interest in aligning LLM system orchestration with actual human evaluative standards and practical requirements.


In summary, Arch-Router (Tran et al., 19 Jun 2025) represents a substantial advancement in LLM routing by integrating preference-aligned policy selection, decoupled route-to-model mapping, and high-performance generative modeling within a low-latency, extensible framework. Its mechanism for encoding subjective, user-defined criteria directly in the routing prompt enables transparent, controllable, and auditable operation, setting a new standard for model orchestration in real-world, multi-agent LLM environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Arch-Router.