Create a Video View Paper

SkillOpt: Teaching Agents to Evolve Their Own Expertise

This lightning talk introduces SkillOpt, a breakthrough framework that treats agent skills as trainable external artifacts rather than frozen instructions. By applying deep-learning-style optimization to natural language skill documents, SkillOpt achieves dramatic performance gains across diverse benchmarks without touching model weights. We explore how bounded editing, validation gating, and dual-speed updates enable agents to systematically refine procedural knowledge, producing compact, transferable skills that lift accuracy by over 20 points on average.

Script

Agent skills are usually brittle instructions that break when conditions shift. SkillOpt flips this by treating skills as external artifacts you can train, optimize, and reuse, like weights in a neural network, but written in plain language.

The agent runs tasks under a current skill, collecting what worked and what failed. An optimizer model reflects on these trajectories, proposes targeted edits to the skill document, and only accepts changes that pass validation on held-out examples.

Each optimization step is bounded by a textual learning rate that caps how much the skill can change at once. Most proposals are rejected and buffered as negative feedback, ensuring edits generalize rather than overfit to individual failures.

Across six benchmarks and seven models, SkillOpt lifts average accuracy by 17 to 25 points, outperforming every baseline including hand-written skills. On procedural tasks like spreadsheet manipulation, gains exceed 30 points with just one to four accepted edits per skill.

Optimized skills transfer across model sizes, execution harnesses, and even nearby task domains, behaving as true reusable artifacts. Final documents are compact, typically under 2,000 tokens, and encode general procedural rules rather than memorized solutions.

By decoupling skill training from model weights, SkillOpt enables offline optimization with frontier models while adding zero cost at deployment. Head over to EmergentMind.com to explore the full paper and create your own video summaries of the latest research.