SkillOrchestra: Learning to Route Agents via Skill Transfer

Published 23 Feb 2026 in cs.AI and cs.LG | (2602.19672v1)

Abstract: Compound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input-level routers make coarse query-level decisions that ignore evolving task requirements; (2) RL-trained orchestrators are expensive to adapt and often suffer from routing collapse, repeatedly invoking one strong but costly option in multi-turn scenarios. We introduce SkillOrchestra, a framework for skill-aware orchestration. Instead of directly learning a routing policy end-to-end, SkillOrchestra learns fine-grained skills from execution experience and models agent-specific competence and cost under those skills. At deployment, the orchestrator infers the skill demands of the current interaction and selects agents that best satisfy them under an explicit performance-cost trade-off. Extensive experiments across ten benchmarks demonstrate that SkillOrchestra outperforms SoTA RL-based orchestrators by up to 22.5% with 700x and 300x learning cost reduction compared to Router-R1 and ToolOrchestra, respectively. These results show that explicit skill modeling enables scalable, interpretable, and sample-efficient orchestration, offering a principled alternative to data-intensive RL-based approaches. The code is available at: https://github.com/jiayuww/SkillOrchestra.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a framework that uses explicit, transferable skill abstractions to route agents efficiently and mitigate routing collapse.
The methodology employs a Skill Handbook with mode-level insights and agent profiles, achieving up to +22.5 accuracy improvements with ~700x cost reduction.
The approach enhances data efficiency and interpretability in multi-turn tasks, providing a scalable solution over traditional RL-based orchestrators.

SkillOrchestra: Learning to Route Agents via Skill Transfer

Introduction

SkillOrchestra (2602.19672) addresses a critical challenge in multi-agent AI systems: enabling efficient, interpretable, and data-efficient orchestration of LLMs and tools to solve complex, multi-turn tasks. The proposed framework departs from previous paradigms dominated by query-level model routing or RL-based agent orchestration, introducing explicit intermediate skill abstractions. This approach enables state-conditioned, fine-grained routing, mitigates routing collapse, and produces modular orchestration knowledge transferable across orchestrator architectures.

Limitations of Prior Orchestration Approaches

Traditional model routing frameworks operate primarily at the query level, performing static selection of models and lacking the capacity to adapt to the evolving requirements of multi-turn, agentic workflows (see Figure 1). RL-trained orchestrators partially address the sequential decision process but suffer from high sample and training costs, poor adaptability to evolving model/tool pools, and policy collapse—degenerating to repeatedly using a single model regardless of context (see left panel of Figure 2). These limitations restrict both efficiency and the ability to match agent competence to fine-grained capability demands.

Figure 1: Comparative characteristics of model routing, direct agent orchestration, and skill-aware agent orchestration, highlighting the explicit capability modeling central to SkillOrchestra.

SkillOrchestra: Framework Overview

SkillOrchestra reframes orchestration as skill-grounded decision making. The key innovation is a Skill Handbook—an explicit, reusable experience base encoding mode-specific execution insights, a registry of fine-grained, reusable skills, and agent profiles summarizing competence and cost per skill. Orchestration is decomposed into two phases: (1) mode selection (determining the required capability class, e.g., search, code), and (2) skill-grounded agent routing (matching the required skill set to agents with the best expected competence-cost trade-off).

Figure 3: SkillOrchestra architecture: (left) skill discovery and competence estimation; (middle) Pareto-based selection of a handbook for a given orchestrator; (right) deployment-time mode and skill-aware routing.

This skill-centric approach confers several key advantages:

The orchestrator reasons over explicit capability requirements rather than learning implicit, entangled policies.
Enables sample-efficient, interpretable, and stable routing without full end-to-end RL.
The Skill Handbook is modular and transferable: it can be constructed offline and reused across different backbone orchestrators and changing agent pools.

Skill Handbook Structure and Learning

The Skill Handbook encompasses: (i) mode-level routing and transition insights; (ii) a registry of skills, where each skill is defined as a reusable abstraction for a capability with natural language descriptors and contextual indicators; (iii) agent profiles with skill-conditioned success probabilities and cost statistics.

Skill discovery leverages contrastive analysis between successful and failed trajectories, distilling their capability gaps into new skill definitions via LLM-based abstraction. Agent profiles are constructed by aggregating skill-level performance statistics across traces, modeled via Beta posteriors for success estimation. The skill set is incrementally refined via splitting (when agent variance across a skill is large) and merging (for indistinguishable skills), guided by statistical analysis and LLM-based reflection.

Figure 4: Instantiation of a Skill Handbook: left, mode-level routing insights; middle, hierarchical registry of skills; right, agent profiles encoding skill competence and cost.

Fine-tuning the granularity of the handbook to the orchestrator's reasoning capacity is critical: overly fine-grained skills can induce misidentification errors, while coarse skills sacrifice specialization. SkillOrchestra employs Pareto-optimal validation to select a handbook subset that empirically maximizes global performance-cost trade-off for the target orchestrator.

Agent Orchestration via Skill Transfer

At inference, orchestration proceeds by first retrieving a task- and orchestrator-conditioned subset of the Skill Handbook. The orchestrator samples the next operational mode using mode-level execution metadata, identifies active skills in context, and selects the agent maximizing expected skill competence minus cost, aggregating weighted per-skill Beta means from agent profiles (see Section 4.3; Algorithm in App. A). This explicit dependence on interpretable, transferable skill meta-data yields robust allocations: models are escalated only when lower-cost alternatives are insufficient, and tool usage is minimized by explicit skill requirement matching.

Empirical Results

SkillOrchestra is evaluated in both model routing and full agent orchestration setups across ten QA, reasoning, and mathematical benchmarks. In all regimes, SkillOrchestra lies on the empirical Pareto frontier of performance vs. cost.

Figure 5: Performance-cost tradeoffs for multi-turn model routing and agent orchestration: SkillOrchestra achieves higher accuracy at lower or equal cost compared to all baselines.

Notable outcomes:

On model routing, SkillOrchestra outperforms all baselines, including the RL-trained Router-R1 [zhang2025routerr], with up to +22.5 absolute accuracy and $\sim$ 700x cost reduction over RL approaches.
The Skill Handbook provides strong data efficiency: comparable or superior performance is achieved with two orders of magnitude fewer training samples.
Routing collapse is effectively mitigated: SkillOrchestra distributes queries across specialized models according to their competence/cost profiles, in contrast to RL approaches which tend to collapse to a single large model (see left of Figure 2).
The Skill Handbook is transferable: when moved from a Qwen2.5-3B backbone to stronger orchestrators and model pools, performance improves further without retraining (right panel, Figure 2).
Figure 2: (Left) RL-based Router-R1 exhibits routing collapse to the largest model; SkillOrchestra achieves balanced allocation. (Right) Skill Handbook transfers to stronger orchestrators, consistently improving accuracy.

In agent orchestration (including tool selection and multi-modal flows), SkillOrchestra systematically outperforms RL-based ToolOrchestra (Su et al., 26 Nov 2025) and commercial proprietary orchestrators (e.g., GPT-5, Claude Opus 4.5, Gemini 3 Pro), achieving superior accuracy with lower overall inference cost.

Ablation analysis reveals that each component (explicit skills, refinement, handbook selection) is essential for optimal performance-cost trade-off; over-fragmented or coarse skills degrade results. The modular structure supports scalable integration as agent libraries evolve.

Theoretical and Practical Implications

SkillOrchestra demonstrates that explicit, transferable skill modeling is a scalable solution for compound agent orchestration. By shifting optimization from monolithic routing policies to modular skill abstractions, orchestration becomes more adaptive, interpretable, and cost-efficient. This skill transfer paradigm enables the isolation and propagation of domain- and mode-specific competence, facilitating rapid adaptation as models, tools, and task distributions evolve.

Practically, this framework can serve as a blueprint for future multi-agent composition systems, mitigating inference cost explosion in large pools and improving fault tolerance and interpretability. Theoretically, this modular approach invites further work in compositional RL, multi-agent coordination, and meta-learning, where explicit abstraction and modular knowledge organization are top research priorities.

Conclusion

SkillOrchestra introduces a new paradigm for agent orchestration in compound AI systems—centered on explicit, modular, and transferable skill modeling. It delivers significant improvements in both accuracy and cost efficiency over heuristic, discriminative, and RL-based baselines, while enabling stable, interpretable, and scalable orchestration policies. Its empirical and algorithmic findings position skill transfer as a compelling alternative for deploying robust, efficient agentic systems in open, evolving AI ecosystems.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Glossary

off on

Practical Applications

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

This paper introduces SkillOrchestra, a new way to run complex AI systems that use many models and tools together. Instead of relying on one big model to do everything, SkillOrchestra acts like a smart conductor that chooses the right “musician” (AI agent) for each part of a task based on the specific skills needed and the cost. The goal is to make AI systems more accurate, cheaper, and easier to manage.

What questions does it try to answer?

To make the idea clear, here are the main questions the paper explores:

How can we choose the best AI model or tool at each step of a multi-step task, instead of making one big decision at the start?
Can we avoid the common problem where a router keeps picking the strongest (but most expensive) model over and over, even when a lighter model would do just fine?
Is there a way to plan AI decisions using “skills” (like searching the web, solving math, or writing code) so that choices are more reliable, cheaper, and reusable?
Can we build a reusable “Skill Handbook” that helps different orchestrator models make better decisions without retraining?

How did they do it?

Think of an AI system like a sports team or an orchestra:

Different players (or musicians) have different strengths.
The coach (or conductor) shouldn’t just pick the star player every time—it should choose the person with the right skill for the moment.
Also, you need to keep track of how good each player is at specific skills, what it costs to use them, and when each skill matters.

SkillOrchestra works in three simple steps:

1) Build a Skill Handbook

The Skill Handbook is a reusable guide that contains three parts:

Mode-level insights: Tips on when to use a mode like “Search,” “Code,” or “Answer.”
Skills: Fine-grained abilities (for example, “symbolic logic,” “data processing,” “high-precision math,” “multi-hop fact linking”).
Agent profiles: For each agent (a model plus its allowed tools), estimates of how likely it is to succeed on each skill and how much it costs (like tokens or time).

How it’s built (in everyday terms):

The system runs sample tasks using different agents.
When one attempt succeeds and another fails, it asks: “What skill was missing?” Then it records that skill in the handbook.
Over time, it keeps refining the skill list—splitting vague skills into clearer ones or merging duplicates—so the handbook stays clean and useful.
It also tracks how different agents perform on each skill and how expensive they are to use.

2) Pick the right handbook version for your orchestrator

Not all orchestrators (the “conductor” model) are equally smart or detailed. A very fine-grained skill list might confuse a smaller orchestrator. So SkillOrchestra selects the best level of skill detail for the target orchestrator using validation, aiming for the best accuracy at the lowest cost (this is called choosing a point on the “Pareto frontier,” where you can’t get better accuracy without paying more cost, or vice versa).

3) Use the handbook during a task

When solving a multi-step problem:

The orchestrator first chooses the mode (e.g., Search vs. Code) based on the handbook’s guidance.
Then it identifies which skill is needed right now.
Finally, it picks the agent that’s best for that skill and balances performance vs. cost.

In short: at each step, the system asks “what skill do we need?” and “who’s best at that skill for a reasonable price?”

Helpful analogies for key terms

Orchestration: The conductor deciding who plays next and what they play.
Agent: A model plus tools (like a coder model with Python access or a general model with web search).
Mode: The type of action (Search, Code, Answer)—like switching from strings to percussion in an orchestra.
Skill: A specific ability needed right now (like “multi-hop reasoning” or “precision arithmetic”).
Reinforcement learning (RL): Teaching by trial-and-error with rewards for good outcomes.
Routing collapse: When the router keeps picking the same powerful model every time, even when it’s wasteful.
Pareto frontier: The set of “best trade-offs” where improving accuracy would raise cost, and lowering cost would reduce accuracy.

What did they find, and why is it important?

Here are the main results, explained simply:

Better accuracy at lower cost:
- SkillOrchestra beat strong RL-based systems by up to 22.5 percentage points in accuracy on tough benchmarks.
- It also ran cheaper, often cutting total cost by around 2× in math tasks and significantly reducing costs in QA and full agent orchestration scenarios.
Avoids routing collapse:
- RL routers tended to call one massive model almost all the time (over 98% of the time in tests), which is very expensive and wastes specialized models.
- SkillOrchestra spread calls across models based on the skills needed, using big models only when necessary and relying on lighter models for simpler steps.
Much cheaper to train:
- Compared to RL systems like Router-R1 and ToolOrchestra, SkillOrchestra reduced learning (training) costs by roughly 700× and 300×, respectively, because it learns a reusable handbook from execution traces instead of doing heavy end-to-end RL.
Transfers across orchestrators:
- The same Skill Handbook worked with different orchestrator backbones (small or large models) without retraining and consistently improved performance, especially with stronger orchestrators.
Works beyond model selection:
- SkillOrchestra also boosts full agent orchestration (multiple modes and tools: Search, Code, Answer). It reached the highest accuracy while keeping costs lowest among strong baselines, including RL-tuned systems and proprietary models.
More skills aren’t always better:
- The best results came from discovering skills, refining them (splitting/merging), and selecting the right granularity for the orchestrator. Too fine-grained can confuse weaker orchestrators; too coarse can reduce precision.

Why this matters: It shows that organizing AI decisions around skills—rather than blindly “training the router harder”—can be more accurate, more stable, cheaper, and easier to reuse.

What’s the bigger impact?

If adopted widely, SkillOrchestra could:

Make complex AI systems smarter and cheaper by choosing the right agent at the right time.
Reduce wasteful spending on big models when smaller ones would do.
Provide interpretable, reusable orchestration knowledge (the Skill Handbook) that can be shared, updated, and transferred as model pools change.
Cut down on data-hungry RL training, making deployment faster and more practical.
Help future AI systems handle complex, multi-step tasks (like research and scientific discovery) in a more reliable, skill-aware way.

In short: SkillOrchestra is like giving AI a playbook of skills and a coach who knows which player to send in. It leads to smarter choices, lower costs, and better teamwork among models and tools.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

The paper introduces SkillOrchestra and reports promising results, but several aspects remain missing, uncertain, or unexplored. Future work could address the following gaps:

Skill discovery reliability and reproducibility: The LLM-based “discoverer/reflector” that defines and refines skills is heuristic and underspecified. How stable are discovered skills across runs, LLMs, seeds, and datasets, and how much human oversight is required to ensure consistent taxonomies?
Determining active skills at runtime: The method states that the orchestrator “identifies a set of relevant skills” $\Sigma_t$ and uses “indicators” and nearest-neighbor retrieval, but provides no concrete algorithm or evaluation of detection accuracy. What is the precision/recall of skill activation, and how sensitive is routing to misidentified skills?
Skill granularity selection mechanisms: Pareto validation is used to pick skill granularity, yet the procedure risks overfitting to the validation set and $\lambda$ choices. How robust is this selection across domains, budgets, and orchestrator capacities, and can it adapt online as conditions change?
Agent competence modeling assumptions: Success probabilities are estimated independently per skill with Beta-Bernoulli updates. This ignores correlations between skills, context dependencies, and non-stationarity (e.g., model updates, prompt changes). Can competence be modeled with hierarchical, contextual, or drift-aware methods with uncertainty calibration?
Exploration vs. exploitation and uncertainty use: Routing uses posterior means but no explicit treatment of uncertainty for decision-making. How should the system balance exploration of under-observed agents/skills and robust exploitation, especially in cold-start or shifting environments?
Mode selection learning and termination: The paper avoids RL but does not specify how mode policies are learned, updated, or when to stop (e.g., when to switch from search to code or to terminate). What learning signals or algorithms (supervised, bandit, imitation) are used for mode transitions and horizon control?
Skill compositionality and interactions: The aggregator uses a weighted average over skills, assuming additivity. How to handle interactions (synergy/conflict

View Paper Prompt View All Prompts

Glossary

Agent instantiation: Formal definition of how an agent is constructed from a model and a subset of tools. "An agent is defined as a pair A = (m, \mathcal{T}_A), \quad m \in \mathcal{M}, \; \mathcal{T}_A \subseteq \mathcal{T},"
Agent orchestration: Coordinating multiple models and tools across modes to accomplish a task. "We next evaluate whether SkillOrchestra extends beyond model routing to full agent orchestration, where the system must coordinate multiple operational modes and tools beyond model selection."
Agent profile: A summary of an agent’s competence and cost characteristics conditioned on skills and modes. "An agent profile summarizes an agent’s mode-conditioned competence, cost, and routing characteristics for skill-aware orchestration."
Agentic task environment: A setting where agents perform multi-step reasoning in response to a query. "We consider an agentic task environment where a user instruction $q \in \mathcal{Q}$ initiates a multi-step reasoning process."
Beta distribution: A probabilistic model used to estimate success probabilities for agent-skill pairs. "we model success probability as $\phi_{A,\sigma} \sim \mathrm{Beta}(\alpha_{A,\sigma}, \beta_{A,\sigma})$ "
Cascade strategies: Heuristic routing methods that escalate queries based on difficulty or budget. "Early approaches rely on heuristic or cascade strategies~\citep{chen2024frugalgpt} that escalate queries based on predicted difficulty or budget constraints"
Chain-of-Thought (CoT): A prompting technique that elicits step-by-step reasoning from LLMs. "CoT~\citep{wei2022chain}"
Competence-aware agent routing: Selecting agents based on estimated skill competence and cost under the current mode. "Competence-Aware Agent Routing. Conditioned on the selected mode $\psi_t$ , the orchestrator identifies a set of relevant skills $\Sigma_t \subseteq \Sigma_{\psi_t}$ "
End-to-end: Training or optimizing a routing policy directly as a single process. "enables data-efficient, transferable, and more balanced orchestration without end-to-end RL training."
Execution trace: The outputs produced by an agent’s operation at a turn (e.g., search results, code). "The selected agent produces an execution trace $z_t$ (e.g., search results or generated code),"
Foundation models: Large pretrained models serving as backbones for agents. "A set of candidate foundation models $\mathcal{M} = \{m_1, \dots, m_{K_M}\}$ ,"
GRPO: A reinforcement learning algorithm used to finetune orchestrators. "Prior work typically instantiates this optimization via RL such as GRPO~\citep{shao2024deepseekmath}"
GraphRouter: A graph-based model routing approach. "GraphRouter~\citep{feng2025graphrouter}"
KNN Router: A similarity-based routing method using nearest neighbors. "KNN Router~\citep{hu2024routerbench}"
Latency: Time delay used as part of execution cost modeling. "the estimated execution cost (e.g., latency, token usage) under mode $\psi$ "
Mode policy: The policy that selects the operational mode at each turn. "The mode policy $\pi_{\text{mode}$ selects the current operational mode $\psi_t$ based on the interaction state $s_t$ "
Mode–skill index: A mapping from modes to associated skills used to constrain routing. "Mode--skill index $\mathcal{E}$ . The graph structure induces a mapping $M : \Psi \rightarrow 2^{\Sigma}$ "
Model routing: Selecting a model from a pool to answer a query, often in single-shot settings. "A common form of orchestration is model routing, where a controller selects a model from a model pool"
Multi-hop QA: Question answering tasks requiring reasoning across multiple pieces of information. "Multi-hop QA: HotpotQA~\citep{yang2018hotpotqa}, 2WikiMultiHopQA~\citep{ho-etal-2020-constructing}, Musique~\citep{trivedi-etal-2022-musique}, and Bamboogle~\citep{press-etal-2023-measuring}"
Multi-turn interactions: Dialogues or workflows involving multiple sequential turns. "This assumption breaks down in multi-turn interactions, where different states require distinct capabilities."
Operational modes: Abstract action types (e.g., search, code) defining allowed tools and operations. "Operational Modes ( $\Psi$ ). A set of abstract action modes $\Psi = \{search, code, \dots\}$ defined at the capability level."
Orchestrator: The controller that plans and allocates resources across agents and modes. "The Orchestrator ( $\mathcal{O}$ ): A central controller responsible for high-level planning and resource allocation."
Orchestrator-specific subset: A tailored selection of handbook content matched to an orchestrator’s capacity. "we therefore select an orchestrator-specific subset $\mathcal{H}_{\text{base}}^{(\mathcal{O})}$ for orchestrator $\mathcal{O}$ via Pareto-optimal validation"
Pareto frontier: The set of solutions that optimally trade off performance and cost. "SkillOrchestra and SkillOrchestra+ lie on the Pareto frontier, achieving higher accuracy at lower cost than all baselines."
Pareto-optimal validation: Validation procedure to choose handbook subsets that optimize accuracy–cost trade-offs. "we therefore select an orchestrator-specific subset ... via Pareto-optimal validation"
Policy factorization: Decomposing the overall policy into mode selection and routing components. "We factorize the policy as $\pi(a_t \mid s_t) = \pi_{\text{mode}(\psi_t \mid s_t)\cdot \pi_{\text{route}(A_t \mid s_t, \psi_t),$"
Posterior mean: The mean of the posterior distribution used to aggregate competence estimates. "aggregating the posterior means over the active skill set"
Proximal Policy Optimization (PPO): A reinforcement learning algorithm used to train routing policies. "Router-R1~\citep{zhang2025routerr}, a strong PPO-trained~\citep{schulman2017proximal} multi-turn router"
RAG (Retrieval-Augmented Generation): A method that augments generation with retrieved information. "RAG as in~\citet{zhang2025routerr}"
Reinforcement learning (RL): Framework for learning sequential decision policies via rewards. "Recent RL-based orchestration methods~\citep{zhang2025routerr,toolorchestra} address this by learning sequential routing policies with LLMs."
Routing collapse: Degeneration of a routing policy to repeatedly select a single option despite alternatives. "routing collapse: the degeneration of the orchestration policy into repeatedly selecting a single option at one or more decision levels"
Semantic alignment: Matching the current state’s semantics to agent profiles during routing. "optionally incorporating semantic alignment between the current state and the agent profile"
Skill Handbook: A reusable knowledge base of skills, mode insights, and agent profiles guiding orchestration. "we learn a Skill Handbook $\mathcal{H}$ , a reusable experience base that captures (i) mode-level execution insights ... (ii) fine-grained skills ... (iii) agent profiles ..."
Skill-aware orchestration: Orchestration paradigm that reasons over explicit skills rather than direct policy optimization. "we introduce SkillOrchestra, a framework for skill-aware orchestration."
Skill registry: The catalog of discovered skills organized within the handbook. "The handbook maintains a registry of skills (Definition~\ref{def:skill}),"
State-conditioned: Decisions that depend on the current interaction state rather than just input-level features. "First, it enables state-conditioned, fine-grained orchestration, allowing different models to specialize across capabilities."
Success probability: The probability an agent succeeds on a given skill, modeled probabilistically. "we model success probability as $\phi_{A,\sigma} \sim \mathrm{Beta}(\alpha_{A,\sigma}, \beta_{A,\sigma})$ "
Trajectory: The sequence of states, actions, traces, and observations over turns. "This interaction induces a trajectory $\tau = (s_0, a_0, z_0, o_0, s_1, a_1, z_1, o_1, \dots, s_T)$ ."
Trajectory-level rewards: Rewards computed over entire execution trajectories to optimize accuracy–cost. "optimizing performance-cost trade-offs via trajectory-level rewards."
Tradeoff hyperparameter: A scalar λ weighting performance against cost in the objective. " $\lambda$ is a tradeoff hyperparameter."
Tool pool: The set of executable tools available to agents within the environment. "Tool Pool ( $\mathcal{T}$ ). A set of executable tools $\mathcal{T} = { t_1, \dots, t_{K_T} },"

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

The following applications can be deployed with current technology and workflows using the paper’s SkillOrchestra framework, Skill Handbook, and Pareto-optimal selection strategy.

Cost-aware LLM gateways for enterprise model routing (Software, Finance)
- Use case: Deploy SkillOrchestra to dynamically route queries across a pool of LLMs, balancing accuracy and per-token cost to avoid routing collapse (e.g., always choosing the largest model).
- Tools/products/workflows: “Skill Handbook Builder” from production logs; routing SDK for LangChain/LangGraph/AutoGen; dashboards showing Pareto trade-offs; policy controls to enforce budgets.
- Assumptions/dependencies: Access to multiple models with cost telemetry; collection of execution traces; reliable success/failure signals per task.
Multi-turn customer support assistants with balanced model/tool use (Industry: Customer Service, CRM)
- Use case: Route among modes like retrieve (RAG/CRM search), analyze (sentiment/classification), and answer, invoking higher-cost models only when skill demands warrant it.
- Tools/products/workflows: CRM connectors, knowledge base search, ticketing systems; skill-grounded orchestration policies; audit trails via skill descriptions.
- Assumptions/dependencies: Accurate mapping of support intents to skills; secure integrations with customer data; tool availability (search, summarization).
Software engineering copilots with competence-aware escalations (Software)
- Use case: In IDEs/CI, select coding, debugging, or test-generation modes; route to specialized coding LLMs or run test suites, escalating to a stronger model only on complex skills (e.g., symbolic logic or refactoring).
- Tools/products/workflows: Orchestration in VS Code/JetBrains; Python/containers for execution; competence estimates per coding subskill.
- Assumptions/dependencies: Code execution sandbox; skill registry aligned to code tasks; access to code-specialist models.
Data analytics assistants that orchestrate SQL, visualization, and narrative synthesis (Software, Enterprise Analytics)
- Use case: Decompose tasks into query building, data cleaning, visualization, and summarization; route to DB tools and appropriate models with explicit cost-performance trade-offs.
- Tools/products/workflows: DB connectors, BI tools; mode/skill-aware routing; governance for data access.
- Assumptions/dependencies: Stable mapping from analytics tasks to skills; role-based access control; telemetry on query success.
Content moderation and review pipelines with selective deep reasoning (Media/Trust & Safety)
- Use case: Apply fast classifiers for clear cases; invoke high-precision reasoning models only for borderline content (e.g., nuanced policy violations).
- Tools/products/workflows: Moderation APIs, policy skill registry; agent profiles capturing false-positive/false-negative tendencies.
- Assumptions/dependencies: Label definitions and skills aligned with policy; legal/compliance review of routing logic.
Legal document analysis with mode-aware agent orchestration (LegalTech)
- Use case: Flow through retrieve clauses, structure extraction, risk assessment, and synthesis steps; choose agents based on competence in specific skills (e.g., clause similarity, obligation extraction).
- Tools/products/workflows: Document search, clause extraction tools; orchestration of summarization vs. analysis models; audit-ready skill definitions.
- Assumptions/dependencies: Reliable ground truth or heuristics for success; confidentiality controls; stable skill ontology for legal tasks.
Scientific and literature research assistants (Academia, Pharma R&D)
- Use case: Multi-step research tasks: search → screening → synthesis → hypothesis refinement; route to coding (e.g., simple simulations), structured extraction, and summarization agents based on skill demands.
- Tools/products/workflows: PubMed/semantic search APIs; Python notebooks; skill registry for evidence grading.
- Assumptions/dependencies: Integration with scholarly databases; evaluation metrics per skill; responsible use of external tools and data.
Education tutoring agents with domain-specific skill routing (Education)
- Use case: Distinguish between math reasoning, code execution (checking solutions), and reading comprehension; route to specialized agents per skill with cost controls for student usage.
- Tools/products/workflows: Classroom LMS integration; sandboxed code runners; math/domain skill profiles.
- Assumptions/dependencies: Accurate skill detection at student query level; fairness and accessibility considerations.
Incident response copilots for DevOps (Software/IT Ops)
- Use case: Triage logs, retrieve runbooks, propose commands, and execute diagnostics; use cheaper models for routine checks and escalate when symbolic reasoning or complex coding is required.
- Tools/products/workflows: Log ingestion, runbook search; shell/infra APIs; Pareto validation to tune granularity for the orchestrator.
- Assumptions/dependencies: Safe command execution; robust detection of active skills; token cost visibility across tools.
Governance and auditing of AI orchestration decisions (Policy, Risk & Compliance)
- Use case: Replace opaque RL routers with interpretable, skill-grounded handbooks; enable auditors to trace why modes and agents were selected under explicit trade-offs.
- Tools/products/workflows: Handbook versioning; compliance dashboards; policy constraints on routing (e.g., data locality).
- Assumptions/dependencies: Stable skill definitions; logs of mode/agent choices; data privacy and model vendor compliance.
Procurement optimization for model/tool vendors (Finance, Public Sector)
- Use case: Use Pareto-optimal handbook selection to evaluate model pools; quantify performance-cost trade-offs before buying seats/tokens; enforce budget allocations via routing policies.
- Tools/products/workflows: Benchmarking harness; cost simulators; agent profile management.
- Assumptions/dependencies: Comparable metrics across vendors; reliable pricing and latency data; representative validation set.
Personal productivity assistants with multi-mode orchestration (Daily Life)
- Use case: Plan tasks involving search, spreadsheet formulas, email drafting; invoke code execution for automation; select models by skill and price constraints to fit personal budgets.
- Tools/products/workflows: Calendar/email APIs; spreadsheet/automation tools; skill-aware routing on-device or via cloud.
- Assumptions/dependencies: Access to personal data with consent; small orchestrator capacity may require coarser skill granularity.

Long-Term Applications

These applications build on the paper’s innovations but require further research, scaling, standardization, or development to achieve robust deployment.

Cross-domain skill registries and marketplaces (Software, Ecosystem)
- Vision: Standardized, shareable Skill Handbooks across organizations and frameworks; interoperable skill ontologies to plug-and-play agent pools.
- Dependencies: Community standards for skill schemas; security/privacy for shared traces; tooling for merging/splitting skills across domains.
Safety-critical orchestration in healthcare workflows (Healthcare)
- Vision: Clinical documentation, coding (ICD/SNOMED), evidence retrieval, and synthesis with audited, skill-grounded routing; selective escalation to certified agents.
- Dependencies: Regulatory approval; rigorous validation datasets; robust uncertainty handling; HIPAA-compliant data flows.
Autonomous scientific discovery agents (Academia, Pharma R&D)
- Vision: Orchestrate experiment design, simulation coding, data analysis, and literature reasoning; reuse handbooks across labs and domains.
- Dependencies: High-quality traces; domain-specific toolchains; advanced skill refinement for novel tasks; reproducibility infrastructure.
Real-time robotics orchestration (Robotics)
- Vision: Route among perception, planning, control, and language explanation modes; balance compute, latency, and accuracy under skill demands.
- Dependencies: Hard real-time constraints; edge deployment; agent profiles with latency/energy costs; safety and verification.
Grid and energy operations assistants (Energy)
- Vision: Skill-aware routing for forecasting, anomaly detection, optimization, and policy reporting; selective deep reasoning for contingencies.
- Dependencies: Integration with SCADA; domain-specific skill definitions; robust evaluation of success states; cybersecurity measures.
Financial research and compliance copilots (Finance)
- Vision: Orchestrate data ingestion, risk modeling, backtesting, and narrative reporting; competence-aware routing for quantitative vs. textual tasks.
- Dependencies: Data governance; auditability; skill granularity tuned to model capacity; model/tool validation in regulated contexts.
Dynamic price-aware procurement and model switching (Policy, Enterprise)
- Vision: Real-time orchestration that adapts to changing API pricing/latency/service levels; automatic Pareto re-selection of handbooks.
- Dependencies: Reliable provider telemetry; SLAs for routing changes; governance policies for dynamic switching.
Automated skill refinement with meta-learning (Software/ML)
- Vision: Replace LLM reflectors with meta-learning agents that split/merge skills from outcomes; continuous learning pipelines.
- Dependencies: Stable training signals; avoidance of over-fragmentation; generalization guarantees; data-efficient updates.
On-device and edge skill-aware orchestration (IoT, Mobile)
- Vision: Lightweight orchestrators with coarser skill granularity operating offline/low-latency; selective cloud escalation for hard skills.
- Dependencies: Efficient local models/tools; privacy-preserving traces; adaptive handbook selection for limited capacity.
OS-level agent orchestration services (Platform)
- Vision: Native skill-aware routing integrated into operating systems/frameworks (e.g., “Agent OS”), standardizing modes and tool access across apps.
- Dependencies: Platform APIs; standardization of operational modes; security/capability isolation; developer tooling.
Education platforms with individualized skill routing (Education)
- Vision: Longitudinal student models informing skill detection and routing; personalized budgets; transparent learning pathways.
- Dependencies: Reliable student skill assessment; fairness/ethics; integration with curricula; teacher oversight.
Public-sector digital service orchestration (Policy, Government)
- Vision: Triage and process citizen requests with interpretable skill-based routing; enforce budget, fairness, and transparency mandates.
- Dependencies: Policy frameworks; audit infrastructure; multilingual skill definitions; robust failure handling.

Notes on Assumptions and Dependencies

Across applications, feasibility depends on:

Access to multi-model pools and external tools; telemetry on cost (tokens/latency).
Collection of execution traces to learn and refine the Skill Handbook; reliable success/failure labeling per skill/mode.
Orchestrator capacity alignment with handbook granularity via Pareto-optimal selection; weaker orchestrators benefit from coarser skills.
Data privacy, security, and regulatory compliance when routing across vendors/tools; audit trails enabled by skill-level explanations.
Robustness of LLM-based skill discovery and refinement; potential need for domain experts or meta-learning to validate skills.
Domain-specific evaluation metrics and governance to prevent harmful or biased routing decisions.

SkillOrchestra: Learning to Route Agents via Skill Transfer

Summary

SkillOrchestra: Learning to Route Agents via Skill Transfer

Introduction

Limitations of Prior Orchestration Approaches

SkillOrchestra: Framework Overview

Skill Handbook Structure and Learning

Agent Orchestration via Skill Transfer

Empirical Results

Theoretical and Practical Implications

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions does it try to answer?

How did they do it?

1) Build a Skill Handbook

2) Pick the right handbook version for your orchestrator

3) Use the handbook during a task

Helpful analogies for key terms

What did they find, and why is it important?

What’s the bigger impact?

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Glossary

Practical Applications

Immediate Applications

Long-Term Applications

Notes on Assumptions and Dependencies

Open Problems

Continue Learning

Authors (6)

Collections

GitHub

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

SkillOrchestra: Learning to Route Agents via Skill Transfer

Summary

SkillOrchestra: Learning to Route Agents via Skill Transfer

Introduction

Limitations of Prior Orchestration Approaches

SkillOrchestra: Framework Overview

Skill Handbook Structure and Learning

Agent Orchestration via Skill Transfer

Empirical Results

Theoretical and Practical Implications

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions does it try to answer?

How did they do it?

1) Build a Skill Handbook

2) Pick the right handbook version for your orchestrator

3) Use the handbook during a task

Helpful analogies for key terms

What did they find, and why is it important?

What’s the bigger impact?

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Glossary

Practical Applications

Immediate Applications

Long-Term Applications

Notes on Assumptions and Dependencies

Open Problems

Continue Learning

Related Papers

Authors (6)

Collections

GitHub

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research