Create a Video View Paper

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

This lightning talk explores a breakthrough approach to building powerful AI agents without relying on massive trillion-parameter models. The paper introduces Agents-A1, a 35-billion-parameter system that achieves comparable or superior performance to models 30 times larger by scaling the agentic horizon through explicit process supervision, multi-domain training, and a novel Knowledge-Action Infrastructure that decomposes complex tasks into supervised action chains.

Script

The race to build smarter AI agents has been dominated by a single strategy: make the models bigger. But what if you could match trillion-parameter performance with just 35 billion parameters by scaling something else entirely?

The authors argue for horizon-centric scaling: building agents that excel at long-horizon tasks requiring dozens of steps, tool coordination, and persistent memory. Instead of memorizing patterns through sheer parameter count, their system learns explicit processes for planning, tool use, and iterative refinement across multiple domains.

The breakthrough is the Knowledge-Action Infrastructure, which converts diverse training data into domain-specific graphs linking evidence, actions, observations, and verification outcomes. This explicit structure exposes where decisions happen and provides actionable feedback at every step, not just the final answer.

Agents-A1 is trained in three stages: first, supervised fine-tuning across all domains; second, specialized teachers are trained for search, engineering, science, and tool calling using domain-optimized reinforcement learning; third, these teachers are distilled into a single deployable model using on-policy distillation with domain routing.

The results are striking. On tasks like long-horizon search, scientific reasoning, and instruction following, Agents-A1 matches or surpasses models with 30 times more parameters. On molecular binding prediction it scores 56.8, on instruction following 80.6, demonstrating that process supervision and multi-domain training unlock capabilities previously thought to require massive scale.

This work charts a reproducible path to advanced AI agents by scaling what matters: the depth and structure of reasoning horizons, not parameter counts alone. To dive deeper into horizon-centric agent design and create your own videos exploring cutting-edge research, visit EmergentMind.com.