Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 91 tok/s

Gemini 2.5 Pro 53 tok/s Pro

GPT-5 Medium 29 tok/s

GPT-5 High 26 tok/s Pro

GPT-4o 98 tok/s

GPT OSS 120B 470 tok/s Pro

Kimi K2 216 tok/s Pro

2000 character limit reached

Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills (2402.03244v2)

Published 5 Feb 2024 in cs.LG and cs.CL

Abstract: LLMs have recently been used for sequential decision making in interactive environments. However, leveraging environment reward signals for continual LLM actor improvement is not straightforward. We propose Skill Set Optimization (SSO) for improving LLM actor performance through constructing and refining sets of transferable skills. SSO constructs skills by extracting common subtrajectories with high rewards and generating subgoals and instructions to represent each skill. These skills are provided to the LLM actor in-context to reinforce behaviors with high rewards. Then, SSO further refines the skill set by pruning skills that do not continue to result in high rewards. We evaluate our method in the classic videogame NetHack and the text environment ScienceWorld to demonstrate SSO's ability to optimize a set of skills and perform in-context policy improvement. SSO outperforms baselines by 40% in our custom NetHack task and outperforms the previous state-of-the-art in ScienceWorld by 35%.

Citations (5)

View on Semantic Scholar

Collections

Summary

The paper introduces Skill Set Optimization, which iteratively extracts and refines transferable skills from high-reward subtrajectories, achieving 35-40% performance gains.
It utilizes state and action embeddings with a beam search strategy to select diverse, high-quality candidate skills.
Experimental evaluations in ScienceWorld and NetHack demonstrate SSO’s adaptability and significant improvements over baseline models.

Skill Set Optimization: Enhancing LLMs for Interactive Environments through Transferable Skills

Introduction to Skill Set Optimization (SSO)

In the field of applying LLMs to interactive domains, we encounter the challenge of making continual improvements based on environmental rewards. This challenge leads us to explore Skill Set Optimization (SSO), a novel approach that enhances LLM actor performance by constructing and refining a set of transferable skills. SSO identifies valuable subtrajectories in interaction histories, from which it extracts, scores, and refines skills that lead to high rewards. Presenting skills in-context to LLM actors aims to reinforce beneficial behaviors, while further refinement is achieved by pruning underperforming skills.

Methodology

SSO operates through an iterative process, interacting with the environment using a current LLM actor, extracting potential skill from these interactions based on common, high-reward subtrajectories, and refining the skill set by evaluating skills based on their observed rewards. This continuous cycling of interaction, extraction, and refinement naturally leads to an optimized set of skills that prioritize transferability and effectiveness.

Skill Extraction and Construction

SSO extracts pairs of similar subtrajectories using state and action embeddings, scoring them on similarity, reward, and length. By adopting a beam search strategy, SSO selects a set of candidate skills that maximize the weighted sum of these scores, ensuring diversity and coverage. Skills are then generated by abstractly summarizing the commonalities of these subtrajectories into subgoals and instructional actions, promoting task transfer and adaptability.

Skills are further refined based on their performance in subsequent task interactions. The discounted future rewards observed post-execution of a skill provide a metric for its effectiveness. Skills failing to yield positive rewards are pruned, thereby maintaining a skill set that is not only refined but also conducive to achieving higher task success rates.

Experimental Evaluation

SSO's performance was rigorously evaluated in the text-based ScienceWorld environment and the game-based NetHack environment. Remarkably, SSO outperformed baselines by substantial margins, indicating its efficiency in constructing meaningful skills that significantly enhance task performance.

ScienceWorld: In this environment, SSO demonstrated its capacity for rapid skill adaptation and transfer, achieving a significant improvement over baseline models. An average performance increase of 35% over the previous state-of-the-art model underscores SSO's effectiveness.
NetHack: This environment posed a distinct challenge with its requirement for low-level navigation actions. Despite this, SSO managed a 40% improvement over baseline models, showcasing its adaptability and the robustness of its skill extraction and refinement methodology.

Theoretical and Practical Implications

SSO introduces a transformative approach to in-context policy improvement for LLM actors in interactive environments. By structuring and refining skills based on environmental feedback, it presents a path toward achieving better task adaptation and generalization. Theoretical implications include insights into how skills can be abstractly represented and transferred across different tasks. Practically, SSO's ability to rapidly learn and adapt these skills holds potential for applications in domains requiring complex decision-making and problem-solving capabilities.

Future Directions

While SSO marks a significant advancement, it also opens avenues for further research, such as improving the extraction mechanism for skills in more complex or noisy environments and exploring methods for leveraging negative feedback more effectively. The adaptability of SSO to environments without explicit intermediate rewards also warrants exploration, potentially expanding its applicability.

Conclusion

Skill Set Optimization signifies a notable progression in optimizing LLM actors for interactive environments through the construction and refinement of transferable skills. Its demonstrated success across diverse domains not only validates its effectiveness but also hints at the broader applicability and potential of LLMs in tasks requiring nuanced understanding and action. As we continue to unravel the capabilities of LLMs, approaches like SSO will be pivotal in harnessing their full potential for complex decision-making tasks.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (6)

Tweets

https://twitter.com/ai2_aristo/status/1756029748931944586

https://twitter.com/kolbytn/status/1755978544340148651

https://twitter.com/gm8xx8/status/1754694300636451082

YouTube

Show All Videos