Archon: An Architecture Search Framework for Inference-Time Techniques (2409.15254v6)

Published 23 Sep 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Inference-time techniques, such as repeated sampling or iterative revisions, are emerging as powerful ways to enhance large-LLMs at test time. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of each technique across models and tasks, the interactions between them, and the massive search space for combining them. To address these challenges, we introduce Archon, a modular and automated framework for optimizing the process of selecting and combining inference-time techniques and LLMs. Given a compute budget and a set of available LLMs, Archon explores a large design space to discover optimized configurations tailored to target benchmarks. It can design custom or general-purpose architectures that advance the Pareto frontier of accuracy vs. maximum token budget compared to top-performing baselines. Across instruction-following, reasoning, and coding tasks, we show that Archon can leverage additional inference compute budget to design systems that outperform frontier models such as OpenAI's o1, GPT-4o, and Claude 3.5 Sonnet by an average of 15.1%.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces Archon, a framework that leverages Bayesian optimization to automate the search for effective inference-time architectures in LLMs.
The paper employs multiple techniques—ensembling, multi-sampling, ranking, fusion, critiquing, and unit testing—to systematically enhance response quality.
The paper demonstrates significant improvements, including a 56% boost in GPT-4o’s Pass@1 performance and overall gains of 11-15 percentage points on key benchmarks.

Analysis of "Archon: An Architecture Search Framework for Inference-Time Techniques"

Overview

This paper introduces Archon, a novel framework for constructing inference-time architectures aimed at enhancing the performance of LLMs through an assortment of inference-time techniques. These techniques include generation ensembling, multi-sampling, ranking, fusion, critiquing, verification, and unit testing. By transforming the task of selecting and combining LLMs and inference-time techniques into a hyperparameter optimization problem, Archon employs automated Inference-Time Architecture Search (ITAS) algorithms to optimize architectures for specified benchmarks.

Contributions

Inference-Time Techniques: Archon leverages multiple inference-time techniques to improve model performance. These techniques are categorized into generative, reductive, and comparative components:
- Generative: Generating multiple candidate responses using ensembling and multi-sampling.
- Reductive: Aggregating or filtering responses using techniques like ranking and fusion.
- Comparative: Providing analysis using critiquing and unit testing.
Automated Architecture Search (ITAS): The ITAS algorithm integrates Bayesian optimization to efficiently explore the space of possible architectures. This results in a system capable of producing optimized inference-time architectures that outperform both single-call LLMs and existing inference-time frameworks.
Empirical Evaluation: Archon is evaluated across various benchmarks, showing significant improvements over strong models like GPT-4o and Claude 3.5 Sonnet. For instance, Archon achieves an average performance increase of 15.1 percentage points using all-source models and 11.2 percentage points using open-source models.

Detailed Performance Metrics

The evaluation spans a variety of instruction-following, reasoning, and coding benchmarks, including MT-Bench, Arena-Hard-Auto, AlpacaEval 2.0, MixEval, MixEval Hard, MATH, and CodeContests. The results show that Archon's architectures notably outperform single-call LLMs. For example, in coding tasks evaluated on CodeContests, Archon boosts GPT-4o's Pass@1 performance by 56%, highlighting its efficacy in enhancing the performance of existing models through optimized inference-time architectures.

Implications and Future Directions

The practical implications of this research are substantial, particularly for tasks sensitive to accuracy and reliability, such as scientific research, customer service automation, and code generation. By enabling the efficient production of high-quality responses, Archon can significantly enhance the utility and applicability of LLMs across various domains.

Theoretically, this research underscores the potential of advanced optimization techniques in improving the performance of AI systems. The use of Bayesian optimization within ITAS exemplifies how structured, automated, and intelligent exploration of configuration spaces can yield superior results compared to manual engineering approaches.

Speculation on Future Developments

Future developments in AI could involve the following avenues:

Enhanced Integration of Tools: Incorporating external tools and APIs directly into the inference-time techniques could provide additional layers of verification and enhancement.
Dynamic and Context-Aware Architectures: Further optimization could lead to architectures that dynamically adjust inference-time techniques based on the context of the query.
Scalability to Larger Model Pools: Expanding ITAS to handle even larger pools of models and more combinatorial logic could further improve the robustness and versatility of systems like Archon.

Conclusion

Archon represents a methodical and effective advance in the use of inference compute to amplify the capabilities of LLMs. Through the integration of multiple inference-time techniques and automated architecture search, Archon not only offers a tool for current AI challenges but also sets the stage for future innovations in the domain. The framework’s ability to robustly outperform existing LLMs across diverse evaluation benchmarks indicates its wide-ranging potential applications and reinforces the importance of specialized architecture search in the evolution of AI capabilities.

For further exploration and utilization, the code and datasets have been made publicly available on GitHub: https://github.com/ScalingIntelligence/Archon.