Papers
Topics
Authors
Recent
Search
2000 character limit reached

PolySkill: Continual Skill Induction Framework

Updated 3 July 2026
  • PolySkill is a framework that utilizes polymorphic abstraction to separate abstract goals from concrete implementations, enabling generalizable web agent skills.
  • It applies modular and compositional techniques to induce, refine, and reuse skills across diverse websites and domains.
  • Empirical evaluations show significant improvements in task success rates, reduced steps per task, and enhanced skill transfer compared to baseline methods.

PolySkill is a framework for continual skill induction in web agents, designed to enable efficient, reusable, and generalizable skills by leveraging polymorphic abstraction—systematically separating a skill’s abstract goal from its concrete implementation. LLM-powered agents, operating in partially observable web environments, utilize PolySkill to both solve novel user-specified tasks and to autonomously construct a polymorphic skill library that retains transferability across diverse websites and domains. Central to the framework is the adoption of abstraction methods from software engineering—specifically, polymorphic binding—to support modular, compositional skill induction and robust cross-site generalization (Yu et al., 17 Oct 2025).

1. Mathematical Modeling of Continual Skill Learning

The PolySkill paradigm frames web-agent skill learning as a partially observable Markov decision process (POMDP) with a growing skill library:

  • State space (SS): Latent web environment configuration (DOM tree, open tabs, URL).
  • Primitive action space (ApA_p): Low-level web actions such as click, type, scroll, navigation.
  • Skill library (KtK_t at time tt): Set of reusable, parameterized macro-actions (skills), each possibly invoking primitives or previously induced skills.
  • Expanded action space (At=ApKtA_t = A_p \cup K_t): Permits both direct and compositional invocation.
  • Observation space (Ω\Omega): Tree and visual representations (e.g., A11y + screenshot).
  • Transition (T:S×ApΔ(S)T: S \times A_p \rightarrow \Delta(S)) and observation (O:SΔ(Ω)O: S \rightarrow \Delta(\Omega)) functions.
  • Task distribution (QQ): Source of user or self-generated natural language instructions.

The LLM-based agent policy πL(atot,Mt,Kt)\pi_L(a_t \mid o_t, M_t, K_t), where ApA_p0 records the working memory, determines the next action given observations, action history, and skills. For a horizon ApA_p1, the resulting trajectory is ApA_p2. The immediate objective is to maximize an efficiency-aware reward:

ApA_p3

where ApA_p4 indicates task success and ApA_p5 penalizes lengthy trajectories, subject to additional regularization terms on ApA_p6 to promote skill quality and reuse. This incentivizes compact, reusable skill formation consistent with continual learning goals (Yu et al., 17 Oct 2025).

2. Polymorphic Abstraction Mechanism

PolySkill’s core innovation is the strict decoupling of each skill into:

  • An abstract goal ApA_p7: Method signature specifying what is to be accomplished (e.g., search(query):ResultList).
  • A set of concrete implementations ApA_p8: Website-specific programs or action traces that encode how ApA_p9 is achieved on site KtK_t0.

A domain-level abstract class KtK_t1 (e.g., AbstractShoppingSite) exposes a standardized interface KtK_t2. For each website, the LLM, given KtK_t3 together with a successful reference trajectory KtK_t4, generates the site-specific implementation KtK_t5. Visiting a new site KtK_t6 in the same domain, PolySkill reuses KtK_t7 and binds it to a new KtK_t8 via the same prompt-encoded procedure—polymorphic binding. If KtK_t9 is a context embedding summarizing tt0, then tt1.

This approach ensures reusability: abstract interfaces are shared while implementation details are tailored, allowing skills to generalize and adapt to site-level variation (Yu et al., 17 Oct 2025).

3. Skill Representation and Compositionality

Each skill tt2 is modeled as:

  • A signature tt3: (method name, arguments, return types).
  • A body tt4: Python code or concrete action sequence stored in the dynamic skill library tt5.

Skills support arbitrary composition: higher-order or composite skills can invoke other skills by name, yielding recursive and modular behavior. For example, a purchase workflow may be assembled as:

At=ApKtA_t = A_p \cup K_t3

A generic composition pseudocode:

At=ApKtA_t = A_p \cup K_t4

This compositional logic permits dense combinatorial reuse and supports curriculum growth via hierarchical abstraction (Yu et al., 17 Oct 2025).

4. Induction, Refinement, and Regularization

PolySkill alternates between task execution, success verification, skill induction, and library update:

Task-defined Induction (Algorithm 1):

At=ApKtA_t = A_p \cup K_t5

Task-free Self-exploration (Algorithm 2):

Here, the agent proposes its own goals via tt6 and repeats the same induction/verification loop.

Skill library structure is regularized for polymorphism by penalizing structural divergence between implementations for the same abstract goal across sites:

tt7

where Dist may be code edit-distance or embedding similarity (Yu et al., 17 Oct 2025).

5. Experimental Protocols and Evaluation Metrics

PolySkill is benchmarked on Mind2Web (2,350 tasks, 137 sites, 31 domains), WebArena (812 tasks, five sites), and live sites (Amazon, Target, GitHub, GitLab).

Metrics (Appendix A) include:

tt8

  • Average Steps per successful trajectory:

tt9

  • Skill Reusability:

At=ApKtA_t = A_p \cup K_t0

  • Task Coverage (adoption):

At=ApKtA_t = A_p \cup K_t1

  • Skill Compositionality:

At=ApKtA_t = A_p \cup K_t2

Quantitative results demonstrate PolySkill's gains over baselines and the ASI system:

Method Cross-task Cross-site Cross-domain
Baseline 53.8% 56.2% 62.3%
ASI (+Online) 59.4% 58.7% 62.1%
PolySkill (+Online) 63.2% 61.3% 63.4%
Method Shopping Admin Reddit GitLab Map Cross-app Avg
Baseline 37.4 44.0 66.0 38.9 16.4 10.3 38.5
ASI 46.3 53.6 73.7 46.8 21.5 15.1 46.5
PolySkill 51.4 54.8 73.2 54.2 18.9 18.9 49.3

Additional findings:

  • Skill reusability up to 31% (1.7× baseline).
  • >20% reduction in steps per task via skill reuse.
  • Up to 13.9% relative SR improvement on unseen sites (Yu et al., 17 Oct 2025).

6. Analytical Results and Ablations

Empirical studies reveal:

  • Skill reuse correlates inversely with average steps (e.g., 20% reuse corresponds to steps in 3.3–4.4 range, and PolySkill attains 20.4% reuse by task 180).
  • Continual learning experiments: PolySkill shows superior positive transfer during cross-site adaptation (e.g., from WebArena Shopping to Amazon/Target), with near-zero catastrophic forgetting (retaining original domain SR, in contrast to ASI).
  • Autonomous exploration: In the absence of preset tasks, PolySkill's self-guided curriculum induces generalizable skills, achieving 43.1% SR (vs <38% in single-site curriculum) on shopping sites and 66.2% SR on held-out coding platforms (exceeding static and specialist baselines) (Yu et al., 17 Oct 2025).

7. Limitations and Prospects for Expansion

Key contributions include the introduction of polymorphic abstraction from object-oriented programming (OOP) to LLM skill induction, modular interfaces that support transfer and composition, and empirical performance under continual and self-supervised learning.

Identified limitations:

  • Skill generality is dependent on the quality of the initial abstract interface; suboptimal abstractions propagate errors.
  • Skills may degrade on dynamic sites with changing DOMs, necessitating periodic re-induction.
  • Generalization to “long-tail” domains not aligning with established abstractions remains a challenge.

Proposed directions:

  • Development of automatic skill-repair mechanisms to update implementations following site changes.
  • Integration of failure analysis routines to improve G_k-to-I_k mappings based on unsuccessful bindings.
  • Autonomous RL-based discovery of polymorphic skills with compact specialized models.
  • Community-driven collaborative skill libraries with versioning and quality review (Yu et al., 17 Oct 2025).
Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PolySkill.