PolySkill: Continual Skill Induction Framework
- PolySkill is a framework that utilizes polymorphic abstraction to separate abstract goals from concrete implementations, enabling generalizable web agent skills.
- It applies modular and compositional techniques to induce, refine, and reuse skills across diverse websites and domains.
- Empirical evaluations show significant improvements in task success rates, reduced steps per task, and enhanced skill transfer compared to baseline methods.
PolySkill is a framework for continual skill induction in web agents, designed to enable efficient, reusable, and generalizable skills by leveraging polymorphic abstraction—systematically separating a skill’s abstract goal from its concrete implementation. LLM-powered agents, operating in partially observable web environments, utilize PolySkill to both solve novel user-specified tasks and to autonomously construct a polymorphic skill library that retains transferability across diverse websites and domains. Central to the framework is the adoption of abstraction methods from software engineering—specifically, polymorphic binding—to support modular, compositional skill induction and robust cross-site generalization (Yu et al., 17 Oct 2025).
1. Mathematical Modeling of Continual Skill Learning
The PolySkill paradigm frames web-agent skill learning as a partially observable Markov decision process (POMDP) with a growing skill library:
- State space (): Latent web environment configuration (DOM tree, open tabs, URL).
- Primitive action space (): Low-level web actions such as click, type, scroll, navigation.
- Skill library ( at time ): Set of reusable, parameterized macro-actions (skills), each possibly invoking primitives or previously induced skills.
- Expanded action space (): Permits both direct and compositional invocation.
- Observation space (): Tree and visual representations (e.g., A11y + screenshot).
- Transition () and observation () functions.
- Task distribution (): Source of user or self-generated natural language instructions.
The LLM-based agent policy , where 0 records the working memory, determines the next action given observations, action history, and skills. For a horizon 1, the resulting trajectory is 2. The immediate objective is to maximize an efficiency-aware reward:
3
where 4 indicates task success and 5 penalizes lengthy trajectories, subject to additional regularization terms on 6 to promote skill quality and reuse. This incentivizes compact, reusable skill formation consistent with continual learning goals (Yu et al., 17 Oct 2025).
2. Polymorphic Abstraction Mechanism
PolySkill’s core innovation is the strict decoupling of each skill into:
- An abstract goal 7: Method signature specifying what is to be accomplished (e.g.,
search(query):ResultList). - A set of concrete implementations 8: Website-specific programs or action traces that encode how 9 is achieved on site 0.
A domain-level abstract class 1 (e.g., AbstractShoppingSite) exposes a standardized interface 2. For each website, the LLM, given 3 together with a successful reference trajectory 4, generates the site-specific implementation 5. Visiting a new site 6 in the same domain, PolySkill reuses 7 and binds it to a new 8 via the same prompt-encoded procedure—polymorphic binding. If 9 is a context embedding summarizing 0, then 1.
This approach ensures reusability: abstract interfaces are shared while implementation details are tailored, allowing skills to generalize and adapt to site-level variation (Yu et al., 17 Oct 2025).
3. Skill Representation and Compositionality
Each skill 2 is modeled as:
- A signature 3: (method name, arguments, return types).
- A body 4: Python code or concrete action sequence stored in the dynamic skill library 5.
Skills support arbitrary composition: higher-order or composite skills can invoke other skills by name, yielding recursive and modular behavior. For example, a purchase workflow may be assembled as:
3
A generic composition pseudocode:
4
This compositional logic permits dense combinatorial reuse and supports curriculum growth via hierarchical abstraction (Yu et al., 17 Oct 2025).
4. Induction, Refinement, and Regularization
PolySkill alternates between task execution, success verification, skill induction, and library update:
Task-defined Induction (Algorithm 1):
5
Task-free Self-exploration (Algorithm 2):
Here, the agent proposes its own goals via 6 and repeats the same induction/verification loop.
Skill library structure is regularized for polymorphism by penalizing structural divergence between implementations for the same abstract goal across sites:
7
where Dist may be code edit-distance or embedding similarity (Yu et al., 17 Oct 2025).
5. Experimental Protocols and Evaluation Metrics
PolySkill is benchmarked on Mind2Web (2,350 tasks, 137 sites, 31 domains), WebArena (812 tasks, five sites), and live sites (Amazon, Target, GitHub, GitLab).
Metrics (Appendix A) include:
- Task Success Rate (SR):
8
- Average Steps per successful trajectory:
9
- Skill Reusability:
0
- Task Coverage (adoption):
1
- Skill Compositionality:
2
Quantitative results demonstrate PolySkill's gains over baselines and the ASI system:
| Method | Cross-task | Cross-site | Cross-domain |
|---|---|---|---|
| Baseline | 53.8% | 56.2% | 62.3% |
| ASI (+Online) | 59.4% | 58.7% | 62.1% |
| PolySkill (+Online) | 63.2% | 61.3% | 63.4% |
| Method | Shopping | Admin | GitLab | Map | Cross-app | Avg | |
|---|---|---|---|---|---|---|---|
| Baseline | 37.4 | 44.0 | 66.0 | 38.9 | 16.4 | 10.3 | 38.5 |
| ASI | 46.3 | 53.6 | 73.7 | 46.8 | 21.5 | 15.1 | 46.5 |
| PolySkill | 51.4 | 54.8 | 73.2 | 54.2 | 18.9 | 18.9 | 49.3 |
Additional findings:
- Skill reusability up to 31% (1.7× baseline).
- >20% reduction in steps per task via skill reuse.
- Up to 13.9% relative SR improvement on unseen sites (Yu et al., 17 Oct 2025).
6. Analytical Results and Ablations
Empirical studies reveal:
- Skill reuse correlates inversely with average steps (e.g., 20% reuse corresponds to steps in 3.3–4.4 range, and PolySkill attains 20.4% reuse by task 180).
- Continual learning experiments: PolySkill shows superior positive transfer during cross-site adaptation (e.g., from WebArena Shopping to Amazon/Target), with near-zero catastrophic forgetting (retaining original domain SR, in contrast to ASI).
- Autonomous exploration: In the absence of preset tasks, PolySkill's self-guided curriculum induces generalizable skills, achieving 43.1% SR (vs <38% in single-site curriculum) on shopping sites and 66.2% SR on held-out coding platforms (exceeding static and specialist baselines) (Yu et al., 17 Oct 2025).
7. Limitations and Prospects for Expansion
Key contributions include the introduction of polymorphic abstraction from object-oriented programming (OOP) to LLM skill induction, modular interfaces that support transfer and composition, and empirical performance under continual and self-supervised learning.
Identified limitations:
- Skill generality is dependent on the quality of the initial abstract interface; suboptimal abstractions propagate errors.
- Skills may degrade on dynamic sites with changing DOMs, necessitating periodic re-induction.
- Generalization to “long-tail” domains not aligning with established abstractions remains a challenge.
Proposed directions:
- Development of automatic skill-repair mechanisms to update implementations following site changes.
- Integration of failure analysis routines to improve G_k-to-I_k mappings based on unsuccessful bindings.
- Autonomous RL-based discovery of polymorphic skills with compact specialized models.
- Community-driven collaborative skill libraries with versioning and quality review (Yu et al., 17 Oct 2025).