Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models (2406.11736v1)

Published 17 Jun 2024 in cs.CL and cs.AI

Abstract: One of the primary driving forces contributing to the superior performance of LLMs is the extensive availability of human-annotated natural language data, which is used for alignment fine-tuning. This inspired researchers to investigate self-training methods to mitigate the extensive reliance on human annotations. However, the current success of self-training has been primarily observed in natural language scenarios, rather than in the increasingly important neural-symbolic scenarios. To this end, we propose an environment-guided neural-symbolic self-training framework named ENVISIONS. It aims to overcome two main challenges: (1) the scarcity of symbolic data, and (2) the limited proficiency of LLMs in processing symbolic language. Extensive evaluations conducted on three distinct domains demonstrate the effectiveness of our approach. Additionally, we have conducted a comprehensive analysis to uncover the factors contributing to ENVISIONS's success, thereby offering valuable insights for future research in this area. Code will be available at \url{https://github.com/xufangzhi/ENVISIONS}.

Citations (5)

View on Semantic Scholar

Summary

The paper presents ENVISIONS, a framework that reduces reliance on human annotation by employing neural-symbolic self-training for LLMs.
It uses a three-stage process—self-exploration, self-refinement, and self-rewarding—to iteratively enhance symbolic reasoning capabilities.
The method achieves performance improvements of up to 30%, outperforming traditional fine-tuning and RL-based self-training approaches.

Interactive Evolution: A Neural-Symbolic Self-Training Framework For LLMs

The paper "Interactive Evolution: A Neural-Symbolic Self-Training Framework For LLMs" addresses the challenge of reducing the reliance on human-annotated data for the fine-tuning of LLMs. The authors propose a novel framework named ENVISIONS, which aims to enhance the capabilities of LLMs by employing a neural-symbolic self-training methodology. This approach is designed to manage the scarcity of symbolic data and improve the proficiency of LLMs in processing symbolic language (SL).

Framework and Methodology

ENVISIONS is predicated on an "environment-guided" self-training strategy. The framework iteratively interacts with an embodied environment to gather training data, which alleviates the need for extensive human annotations. The process involves three main stages:

Self-Exploration: The weak LLM generates multiple candidate symbolic solutions for a given task, which are then tested in the environment. The environment provides binary feedback on the correctness of these solutions.
Self-Refinement: Using the initial solutions as references, the LLM generates refined symbolic solutions that are likewise tested, and feedback is recorded. This step helps in polishing the solutions to achieve better accuracy.
Self-Rewarding: A soft reward score is calculated for each solution based on its execution probability, without the need for an external reward model. This score reflects the quality of symbolic solutions, thereby aiding in the reinforcement of effective solutions.

The framework then utilizes these rewarded trajectories, filtered based on a combination of binary and soft rewards, to update a candidate pool. Subsequent iterations refine the policy model by optimizing it with the new data, employing both supervised fine-tuning (SFT) and a specially designed RL-free loss function to learn from mistakes.

Datasets and Baselines

The framework was extensively evaluated across three domains: web agent tasks (MiniWob++), math reasoning (e.g., GSM8K, MATH), and logical reasoning (e.g., ProofWriter, RuleTaker). Baseline comparisons included approaches such as Distill-then-Finetune using teacher models like GPT-4 and Claude-2, as well as RL-based iterative self-training methods.

Results and Analysis

The results underscore ENVISIONS's effectiveness:

LLaMA2-Chat (7B) and LLaMA2-Chat (13B) models demonstrated significant performance improvements with ENVISIONS, highlighting average gains of approximately 30.00% and 24.95% respectively.
ENVISIONS outperformed the Distill-then-Finetune approach by 5.66%-7.13% and showcased superior sustainability and training efficiency when compared to RL-based self-training methods.

Detailed analyses reveal that ENVISIONS strikes a balance between exploratory ability and stability, crucial for effective self-training. The careful trajectory filtering, integration of a self-reward mechanism, and utilization of an RL-free loss contribute to maintaining a clear distinction between positive and negative solutions, which facilitates efficient LLM optimization.

Implications and Future Directions

The implications of this research are substantial for both practical applications and theoretical advancements. Practically, the reduced dependency on human-annotated data and the ability to self-improve through interaction with environments make LLMs more scalable and cost-effective. Theoretically, the neural-symbolic integration presents pathways to enhance the reasoning capabilities of LLMs, enabling them to tackle more complex tasks.

Future research could explore the synergy between ENVISIONS and other self-training methods to further optimize performance or extend the framework to other domains, such as visual environments or robotic control.

Overall, the paper provides a robust contribution to the field of AI, presenting a viable and efficient method for evolving LLMs from weak to strong without extensive human-annotated training data. The insights from this research pave the way for further exploration and innovation in self-training methodologies.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (6)

Tweets

https://twitter.com/Leo_Xu98/status/1805267483647152168