Benchmarking Distributional Alignment of Large Language Models (2411.05403v1)

Published 8 Nov 2024 in cs.CL and cs.AI

Abstract: LLMs (LMs) are increasingly used as simulacra for people, yet their ability to match the distribution of views of a specific demographic group and be \textit{distributionally aligned} remains uncertain. This notion of distributional alignment is complex, as there is significant variation in the types of attributes that are simulated. Prior works have underexplored the role of three critical variables -- the question domain, steering method, and distribution expression method -- which motivates our contribution of a benchmark explicitly addressing these dimensions. We construct a dataset expanding beyond political values, create human baselines for this task, and evaluate the extent to which an LM can align with a particular group's opinion distribution to inform design choices of such simulation systems. Our analysis reveals open problems regarding if, and how, LMs can be used to simulate humans, and that LLMs can more accurately describe the opinion distribution than simulate such distributions.

Summary

The paper evaluates how well large language models align with diverse demographic opinions, introducing the NYT Book Opinions dataset and exploring factors like question domain, steering, and expression methods.
Key findings indicate that verbalizing opinion distributions yields more accurate alignment compared to using model log-probabilities or generating sampled sequences, which can introduce bias.
The study also found that few-shot steering using examples improves alignment accuracy and mitigates stereotypical biases often introduced by persona-based steering methods.

Benchmarking Distributional Alignment of LLMs

The paper "Benchmarking Distributional Alignment of LLMs" by Nicole Meister, Carlos Guestrin, and Tatsunori Hashimoto addresses critical aspects of LLMs' (LMs) ability to achieve distributional alignment, specifically the alignment of model outputs with the diverse opinions of distinct demographic groups. As LLMs like GPT-4, GPT-3.5, and others are increasingly tasked with simulating human behavior across various applications, understanding and improving distributional alignment becomes pertinent.

Research Goals and Methodology

The research primarily seeks to evaluate and benchmark how well LMs align with demographic opinion distributions. This involves exploring three major factors: the question domain, the steering method, and the distribution expression method. The paper introduces a new dataset, NYT Book Opinions, which encompasses opinions beyond political and cultural spheres, contributing to a deeper understanding of LMs' performance in non-traditional domains.

Key Findings

Distribution Expression Method: The paper reveals that verbalizing distributions (output as JSON) provides more accurate alignment compared to model log-probabilities or generating sequences of tokens. Verbalization ensures that the LM’s knowledge is accurately depicted, while sampling-based methods introduce biases, as seen in the experiment with biased coin flips.
Steering Methods: Persona steering, where prompts guide LMs to simulate demographic-specific responses, often leads to stereotypical and inaccurate outputs. Conversely, few-shot steering, which utilizes examples of group-specific opinions, enhances alignment accuracy, mitigating stereotypical biases.
Dataset Variability: The paper finds that LMs align more accurately with datasets like OpinionQA, which contain explicitly political or cultural questions than with datasets such as NYT Books where subjective preferences are distilled from abstract representations like book interests. This suggests a higher complexity in aligning with abstract domains.
Human Comparisons: Human baseline annotations indicate comparable performance to LMs in distributional alignment, underlining that despite LMs' exhaustive training, there remains a significant challenge in surpassing human accuracy in predicting demographic opinion distributions—a non-trivial task given inherent biases and stereotyping tendencies.

Challenges and Implications

The research challenges traditional log-probability-based evaluation methods, pointing out their underestimation of LM performance, primarily due to mis-calibration when models are steered. Future work should focus on improving the sampling capabilities of LMs, addressing knowledge-to-simulation gaps where LMs ‘know’ distributions but fail to sample them accurately.

Moreover, the paper highlights risks associated with LLM deployments, particularly the reinforcement of stereotypes and misaligned simulations of human opinions. Careful consideration is advised in deploying LMs for tasks requiring nuanced understanding and representation of diverse opinions, ensuring that models are monitored and evaluated in varied and context-dependent queries.

Future Directions

The paper delineates open problems and opportunities for enhancing distributional alignment, such as refining steering techniques and exploring alternative model architectures that can better represent the plurality of human opinions. Investigating the limitations of current datasets and the efficacy of diverse prompting strategies remains crucial for advancing the alignment capabilities of LLMs in AI research and application domains. The benchmark established serves as a substantial foundation for future developments in creating more socially accountable and functionally transparent LLMs.

PDF Markdown

Tweets

https://twitter.com/GptMaestro/status/1858074468989689892

https://twitter.com/fly51fly/status/1857925070615883951