Papers
Topics
Authors
Recent
Search
2000 character limit reached

Uncertainty Estimation for Language Reward Models

Published 14 Mar 2022 in cs.CL, cs.AI, and cs.LG | (2203.07472v1)

Abstract: LLMs can learn a range of capabilities from unsupervised training on text corpora. However, to solve a particular problem (such as text summarization) it is typically necessary to fine-tune them on a task-specific dataset. It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons. However, collecting a large preference comparison dataset is still expensive -- and the learned reward models are unreliable out-of-distribution. We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning (RL). Specifically, we use bootstrap aggregating (bagging) to train an ensemble of reward models differing in the initialization of their final layer. Ensembles have proved successful in prior applications of active learning, but we find that in our setting ensemble active learning does not outperform random sampling. Further experiments show that while the aggregate predictions are well-calibrated, the ensemble's estimated epistemic uncertainty is only weakly correlated with model error. We suspect this is because the ensemble members are fine-tuned from a single model and so are similar to one another. This suggests current pre-training methods will need to be modified to support uncertainty estimation, e.g. by training multiple LLMs.

Citations (27)

Summary

  • The paper demonstrates that ensemble methods applied to language reward models can estimate uncertainty but show only a weak correlation (r = 0.36) with model error.
  • It employs bootstrap aggregating and active learning with Thompson sampling to evaluate epistemic and aleatoric uncertainty in fine-tuned models.
  • The study highlights a trade-off between leveraging pre-trained models for efficiency and achieving the diversity necessary for reliable uncertainty estimation.

Uncertainty Estimation with Language Reward Models

This paper investigates the efficacy of ensemble methods for uncertainty estimation in language reward models, particularly in the context of fine-tuning pre-trained LLMs for text summarization. The central hypothesis is that by quantifying model uncertainty, one can improve sample efficiency via active learning and enhance the robustness of reinforcement learning (RL) fine-tuning. The paper explores whether an ensemble of fine-tuned LLMs can provide accurate uncertainty estimates, using bootstrap aggregating (bagging) to train an ensemble of reward models.

Methodology and Experiments

The authors fine-tune a pre-trained LLM to learn a reward model from human feedback, specifically preference comparisons between different textual outputs. An ensemble of reward models is created by reinitializing the final layer of each model and fine-tuning the entire network based on human preference comparisons. The diversity in the ensemble arises from both the initialization of the final layer and bootstrap sampling of the dataset.

The study employs active learning to evaluate the quality of uncertainty estimates on real human data. Active learning is implemented using Thompson sampling and by selecting data points with maximal variance between the predictions of ensemble members. Additionally, experiments are conducted on a synthetic dataset where aleatoric uncertainty is known, allowing for a more precise evaluation of epistemic uncertainty. The Spearman correlation between model error and estimated epistemic uncertainty is used as the key metric.

Key Findings and Implications

The paper reports that ensemble active learning does not significantly outperform random sampling in their experiments. Furthermore, while the aggregate predictions of the ensemble are well-calibrated, the estimated epistemic uncertainty exhibits only a weak correlation with model error. Specifically, the maximum Spearman correlation observed is r=0.36r = 0.36, which explains only about 13% of the variance in model error. This suggests that the ensemble members, being fine-tuned from a single pre-trained model, lack sufficient diversity to provide reliable uncertainty estimates.

The authors conjecture that there exists a trade-off between leveraging pre-trained models for sample efficiency and achieving accurate uncertainty estimation. While fine-tuning from a pre-trained model enhances sample efficiency, it may also limit the diversity necessary for effective uncertainty estimation. This casts doubt on the prevailing paradigm of relying on a single, large pre-trained model and suggests that training multiple distinct, smaller pre-trained models or introducing uncertainty into the pre-training process (e.g., via dropout) might be more beneficial for uncertainty estimation.

Discussion and Future Directions

The paper identifies several limitations, including the focus on a single task (discriminating between good and bad summaries) and the unexplored potential of alternative uncertainty estimation methods. The authors propose that future research should explore linear hypermodels and fine-tuning only the biases to improve uncertainty quality. The paper concludes by emphasizing the need to modify foundation model training procedures to incorporate uncertainty estimation or to develop new methods that can tolerate the lack of diversity when fine-tuning from pre-trained models.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.