Soft Self-Consistency Improves Language Model Agents (2402.13212v2)

Published 20 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Generations from LLMs can be improved by sampling and scoring multiple solutions to select a final answer. Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers. However, when tasks have many distinct and valid answers, selection by voting requires a large number of samples. This makes SC prohibitively expensive for interactive tasks that involve generating multiple actions (answers) sequentially. After establishing that majority voting fails to provide consistent gains on such tasks, we demonstrate how to increase success rates by softening the scoring criterion. We introduce Soft Self-Consistency (SOFT-SC), which replaces SC's discontinuous scoring with a continuous score computed from model likelihoods, allowing for selection even when actions are sparsely distributed. SOFT-SC improves both performance and efficiency on long-horizon interactive tasks, requiring half as many samples as SC for comparable or better performance. For a fixed number of samples, SOFT-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping (WebShop), and a 4.7% increase for an interactive household game (ALFWorld). Finally, we show that SOFT-SC can be applied to both open-source and black-box models.

References (45)

Authors (4)

Han Wang (420 papers)
Archiki Prasad (18 papers)
Elias Stengel-Eskin (49 papers)
Mohit Bansal (304 papers)

Citations (4)

View on Semantic Scholar

Summary

Enhancement of LLM Agents via Soft Self-Consistency

Introduction to Soft Self-Consistency (Soft-SC)

LLM (LM) agents, when tasked with interactive or multi-step operations, commonly face challenges that can significantly affect their performance and efficiency. Traditional methods like self-consistency (SC) seek to address these by generating multiple solutions and employing majority voting to choose the final answer. However, the effectiveness of SC drops in scenarios with diverse valid solutions due to the inherent requirement for identical actions to tally votes. This paper introduces an innovative approach termed Soft Self-Consistency (Soft-SC) that transcends the limitations of exact-match scoring by integrating a continuous scoring mechanism. This method not only enhances performance but also boosts efficiency, particularly in domains with sparse action spaces. A notable achievement of Soft-SC is its ability to attain better performance with fewer sample requirements compared to SC across various tests.

Methodological Innovations

Soft-SC's Core Concept

Soft-SC diverges from SC's reliance on exact matches for scoring, instead utilizing a continuous score calculated from model likelihoods. This approach enables effective action selection among sparsely distributed options, showcasing its utility in interactive tasks with multiple valid answers per step.

Adaptive Sampling

Soft-SC incorporates an adaptive sampling strategy that dynamically adjusts the number of samples based on the convergence of scores towards a threshold. This refinement not only enhances sample efficiency but also contributes to superior task performance with a smaller sampling footprint.

Empirical Evaluations

The paper's experimental analysis reveals several key findings:

Soft-SC consistently outperforms SC and greedy decoding baselines across diverse interactive tasks, demonstrating substantial improvements in success rates with fewer samples.
Importantly, Soft-SC's benefits scale with increased model size, suggesting that larger models can further leverage this method for performance gains.
Additionally, Soft-SC is adaptable to both open-source and proprietary black-box models, broadening its applicability.

Practical and Theoretical Implications

Soft-SC presents a meaningful advancement in the field of LM agents, particularly for applications involving complex sequences of actions. This method's ability to efficiently handle diversity in valid actions and improve upon existing selection methodologies points to significant potential for enhancing interactive AI systems. Theoretically, Soft-SC's approach to scoring adds a new dimension to understanding how LLMs can be optimized for varied and nuanced tasks, promoting further research into continuous scoring mechanisms.

Future Directions and Considerations

The introduction of Soft-SC opens avenues for future exploration, including its integration with other AI optimization techniques and the extension to more diverse tasks beyond the ones tested. Additionally, considering its performance improvements and efficiency gains, subsequent studies could investigate Soft-SC's applicability in real-world scenarios, where LLM agents are tasked with navigating complex environments or performing intricate sequences of actions.

Conclusion

In summary, Soft Self-Consistency offers a robust and efficient method for improving the performance of LLM agents across a range of interactive tasks. By addressing the limitations inherent in traditional majority voting approaches, Soft-SC provides a compelling solution that enhances both the accuracy and efficiency of LLM agents, setting a new benchmark for future developments in the field.

PDF Markdown

Related Papers

GitHub

GitHub - HanNight/soft_self_consistency: Code for paper "Soft Self-Consistency Improves Language Model Agents" (21 stars)

Tweets

https://twitter.com/EliasEskin/status/1760363680255156519

https://twitter.com/mohitban47/status/1823047983925166137

https://twitter.com/fly51fly/status/1761503364049932527

https://twitter.com/EliasEskin/status/1760363692150108327

https://twitter.com/arxivsanitybot/status/1760484528240349407