Online Knapsack Composer Framework
- Online Knapsack Composer is a dynamic framework that models component selection as a knapsack optimization problem under budget constraints.
- It integrates semantic retrieval, threshold-based filtering, and sandbox testing to empirically evaluate and select high-performing agentic components.
- Empirical results report up to a 31.6% increase in success rates for single-agent tasks and notable improvements in multi-agent configurations.
The Online Knapsack Composer is a structured, automated framework that formulates the selection and assembly of agentic components (such as tools, APIs, or agents) as a constrained online optimization problem, directly inspired by the classical knapsack paradigm. Given a complex task, a budget constraint, and a potentially large inventory of reusable resources, the composer systematically identifies and dynamically tests candidate components, modeling and maximizing their real-time utility under budget and compatibility requirements. By integrating semantic retrieval, threshold-based value-to-cost filtering, and empirical sandbox testing, the framework robustly adapts the composition strategy to evolving demands and distinct operational domains, ensuring cost-effective and high-performing agentic systems in both single-agent and multi-agent scenarios (Yuan et al., 18 Oct 2025).
1. Framework Structure and Optimization Model
The core of the Online Knapsack Composer is the explicit modeling of agentic system assembly as a discrete optimization problem analogous to the classic knapsack:
where:
- is the selected subset of components from the inventory ,
- is the cost of component ,
- is the budget,
- is the system-level success probability for task using components in .
The composition proceeds through several critical phases:
- Skill extraction: The task is parsed to yield a set of required skills, each with associated queries.
- Retrieval and pre-filtering: Candidate components are initially selected based on semantic similarity to each skill description.
- Knapsack-based selection: Offline composers use LP or greedy approximation to maximize the utility-to-cost ratio, ensuring coverage and budget feasibility.
- Online knapsack composition: Incorporates dynamic, real-time measurements via sandboxed testing, refining value estimates for each candidate and updating selection thresholds adaptively.
2. Dynamic Component Assessment and Thresholding
Moving beyond static retrieval, the online framework directly measures component performance via empirical trials. For each candidate component and associated skill, the composer:
- Executes structured query sets in a sandboxed environment.
- Quantifies the component’s response with a normalized value score (proportional to the success rate over query responses, rescaled by an upper bound ).
- Calculates the value-to-cost ratio .
Selection is governed by an adaptive threshold (following online knapsack secretary models, e.g., ZCL), admitting if , where is dynamically set based on observed instance characteristics. This continuous re-assessment in response to real-world data ensures that only components with demonstrably high realized utility are included and that suboptimal or redundant tools are efficiently filtered out.
3. Empirical Utility Modeling and Compatibility Constraints
Key to practical effectiveness is the explicit modeling not only of fixed costs and semantic fit but also of real-time utility and inter-component compatibility. The composer leverages:
- Compatibility checks: Ensuring selected components are operationally compatible (e.g., interface coherence, input/output alignment).
- Redundancy avoidance: Dynamic screening for overlaps in capability or unnecessary components under the budget constraint.
- Task/adaptive scoring: Test queries for each skill-task-component triplet are assessed by automated judges (LLMs or rules-based scripts), mitigating reliance on incomplete offline documentation.
This approach specifically addresses the problem of incomplete capability metadata that limits purely retrieval-based methods.
4. Performance Evaluation and Comparative Results
Empirical validation with Claude 3.5 Sonnet and five major benchmarking datasets (including GAIA, SimpleQA, and MedQA) demonstrates that the Online Knapsack Composer:
- Consistently lies on the empirical Pareto frontier—optimizing both success rate and component cost.
- In single-agent pipelines, achieves up to 31.6% higher success rates compared to leading semantic retrieval and static knapsack-based methods.
- In multi-agent system assembly, raises success rates from around 37% to 87% against a pool of over 100 agentic options when compared with retrieval-driven approaches.
- Spends lower average budget for equivalent or higher success, confirming efficiency of dynamic, empirical utility modeling.
These results are robust across task types (short-form QA, research, specialized domains) and are particularly pronounced when the inventory includes high-variance in component costs or capabilities.
5. Multi-Agent and Heterogeneous System Composition
The framework extends naturally from single-agent to multi-agent settings. With a supervisor agent responsible for sub-task delegation among a large population of potential sub-agents:
- The Online Knapsack Composer dynamically probes and ranks agents by empirically-measured performance against sub-task queries.
- It updates its selection online as more granular performance and compatibility data accumulates, accommodating inventory updates and component churn.
- Substantial increases in overall system success are recorded, underscoring the importance of dynamic, context-driven agent selection in heterogeneous settings.
6. Adaptability to Diverse Domains and Operational Regimes
Empirical and methodological evidence demonstrates adaptability across:
- Variable budget constraints—dynamic selection ensures effective utilization regardless of available resource envelope.
- Domain shifts—task-specific query sets and adaptive scoring cater to both open-domain factual QA and complex, domain-specific workflows.
- Inventory heterogeneity—integration of both free and pay-per-use components, with explicit cost modeling, ensures optimal balance of capability and expenditure.
7. Implications and Significance
The Online Knapsack Composer establishes a general paradigm for automated, utility- and budget-aware system composition, solving the challenge of assembling interoperable agentic systems at scale under uncertainty:
- It operationalizes real-time utility evaluation, circumventing the limitations of static capability annotations.
- It provides provable improvements in cost-effectiveness, measured both in success rate and total spend, across single- and multi-agent tasks.
- It is robust to rapidly changing inventories and incomplete or stale component documentation.
- A notable implication is that component selection grounded in empirically measured performance rather than static similarity or catalogues is essential for scalable and successful agentic system integration.
In summary, the Online Knapsack Composer offers a theoretically grounded and empirically validated approach to agentic composition—selecting optimal sets of components via dynamic, knapsack-inspired value-to-cost filtering, real-time utility testing, and compatibility-aware assembly, establishing measurable gains over previous semantic retrieval and static selection methods (Yuan et al., 18 Oct 2025).