Efficient Exploration for LLMs

Published 1 Feb 2024 in cs.LG, cs.AI, cs.CL, stat.ME, and stat.ML | (2402.00396v2)

Abstract: We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve LLMs. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.

Abstract PDF Upgrade to Chat

Citations (12)

View on Semantic Scholar

Summary

The paper demonstrates that active exploration via double Thompson sampling can reduce query numbers by nearly an order of magnitude.
It contrasts passive strategies with methods using uncertainty estimation through epistemic neural networks for enhanced efficiency.
Empirical results indicate that active querying based on information gain accelerates feedback loops, potentially expediting LLMs reaching superhuman performance.

Introduction

The efficiency of exploration in the context of LLMs is a topic of particular importance, especially as it relates to gathering and incorporating human feedback for improvement. Traditional methods often employ passive strategies that may not fully leverage the informative value of human interactions. In this paper by Dwaracherla et al., the focus is on active exploration tactics, specifically through the use of double Thompson sampling, to significantly enhance the efficiency of querying human feedback.

Experimentation and Results

The researchers present a series of experiments highlighting the advantages of active over passive exploration. They compared the efficiency of multiple exploration strategies including Boltzmann exploration, which selectively queries responses predicted to be of higher reward, and approaches using uncertainty assessment through an epistemic neural network (ENN). The latter incorporated an infomax algorithm maximizing feedback information gain and double Thompson sampling, banking on responses with a high probability of being optimal.

Empirical results clearly indicate the superiority of active exploration using double Thompson sampling in terms of attaining high performance with a reduced number of queries. Significantly, double Thompson sampling was shown to require far fewer queries (by nearly an order of magnitude) for a given performance level than passive exploration.

The Significance of Uncertainty Estimation

Uncertainty plays a pivotal role in the effectiveness of exploration algorithms. The paper underlines that algorithms utilizing uncertainty estimation, such as those leveraging ENNs, outperform those based solely on point estimate reward models. For instance, the introduction of double Thompson sampling showcased a marked improvement over Boltzmann exploration, which lacked uncertainty assessment.

Implications and Future Directions

The findings have profound implications for the development of LLMs. Active exploration may potentially truncate the timespan to achieve levels of performance involving "superhuman" capabilities from decades to significantly less time. The research encourages future investigation, especially in the design of more sophisticated ENN architectures and the exploration within multi-turn dialogue scenarios.

In conclusion, the paper delivers compelling evidence for the efficacy of double Thompson sampling in active exploration. By prioritizing uncertainty and information gain, the proposed method accelerates the feedback loop, bringing us closer to realizing the full potential of LLMs in applications that require human-like decision-making and creativity.