Collective Constitutional AI: Aligning a Language Model with Public Input (2406.07814v1)

Published 12 Jun 2024 in cs.AI, cs.CL, and cs.HC

Abstract: There is growing consensus that LLM (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a target population to sourcing principles to training and evaluating a model. We demonstrate the real-world practicality of this approach by creating what is, to our knowledge, the first LM fine-tuned with collectively sourced public input and evaluating this model against a baseline model trained with established principles from a LM developer. Our quantitative evaluations demonstrate several benefits of our approach: the CCAI-trained model shows lower bias across nine social dimensions compared to the baseline model, while maintaining equivalent performance on language, math, and helpful-harmless evaluations. Qualitative comparisons of the models suggest that the models differ on the basis of their respective constitutions, e.g., when prompted with contentious topics, the CCAI-trained model tends to generate responses that reframe the matter positively instead of a refusal. These results demonstrate a promising, tractable pathway toward publicly informed development of LLMs.

Citations (43)

View on Semantic Scholar

Summary

The paper presents a multi-stage framework where public input is systematically incorporated into LM training to align model behavior with societal values.
The methodology involves participants voting on seed statements using consensus metrics to drive input transformation and reduce researcher bias.
Evaluation reveals that the Public model exhibits lower bias across social dimensions while maintaining performance parity on language and math benchmarks.

Collective Constitutional AI: A Multi-Stage Framework for Publicly Informed LLM Training

The paper "Collective Constitutional AI: Aligning a LLM with Public Input," authored by Huang et al., presents an innovative framework aimed at integrating public input into the development and training of LLMs (LMs). This framework, named Collective Constitutional AI (CCAI), is designed to elicit and incorporate normative principles from a broad population, thereby aligning LM behavior with collective public values and preferences. This paper explores the CCAI framework's practical implementation and evaluates its effects on the resultant LMs compared to traditionally trained models.

Overview of CCAI Framework

The CCAI framework breaks the process into multiple stages, beginning with participant selection and proceeding through input elicitation, input transformation, model training, and model evaluation. Each stage introduces specific decision points relating to the operationalization of public preferences and values. The authors emphasize reducing researcher bias throughout these stages by using methods like consensus voting and predefined moderation criteria.

Participant Selection and Input Elicitation

The authors selected a representative sample of U.S. adults (n=1002) based on demographics such as age, gender, income, and geography. Participants were screened to ensure familiarity with generative AI concepts. Input was collected via a web app utilizing the Polis platform, which allowed participants to vote on seed statements and submit their own. This process yielded 1127 statements and detailed voting data.

Input Transformation

Post-elicitation, input statements were filtered and selected based on group-aware consensus (GAC). The GAC metric identifies statements that exhibit broad agreement across different opinion clusters, thus minimizing polarization. The authors aggregated similar statements to avoid redundancy and translated them into principles compatible with Constitutional AI training formats.

Model Training and Evaluation

Two models were trained using Constitutional AI methods: one with the Public constitution derived from participant input, and another with a Standard constitution from existing ethical guidelines. Both models underwent identical training protocols to ensure comparability, with differences solely attributed to their respective constitutions. Evaluations covered various benchmarks, including language understanding (MMLU), mathematical problem solving (GSM8K), bias (BBQ), and political ideology reflection (OpinionQA).

Results and Analysis

Quantitative Results: The Public model demonstrated lower bias across nine social dimensions while maintaining performance parity with the Standard model on language and math evaluations. Both models were similarly rated for helpfulness and harmlessness by human evaluators.

Qualitative Results: Comparative analysis revealed that the Public model's constitution emphasized objectivity, accessibility, and positive engagement more than the Standard constitution. This was reflected in the model responses, with the Public model more likely to reframe contentious topics positively and align with collective principles of unbiased, fact-based information.

Implications and Recommendations

The authors highlight the practical benefits and feasibility of integrating public input into LM development. Lower bias scores in the Public model suggest that collective input methodologies can effectively mitigate social biases in AI systems. Furthermore, the focus on accessibility and positive engagement aligns with evolving needs for inclusive AI that respects diverse user perspectives.

However, the paper also underlines the challenges inherent in operationalizing public values, dealing with conflicting principles, and ensuring comprehensive evaluations. The framework's success invites further research to refine public input methods, explore broader demographic inclusivity, and develop direct metrics for principle adherence. Future iterations should consider expanded educational components for participants and adaptive, culturally sensitive deliberation processes.

Conclusion

Huang et al.'s work in CCAI represents a structured, transparent pathway for incorporating public voices into AI development, promoting model behavior that resonates more closely with societal values. This approach not only enhances ethical AI deployment but also paves the way for more participatory and democratic AI governance. The reduction in bias without sacrificing performance marks a significant step toward developing AI systems that are both effective and ethically aligned with the broader human context. The authors encourage further exploration and experimentation within this paradigm to continuously refine and validate the incorporation of public input in AI development processes.

PDF Markdown

Related Papers

Tweets

https://twitter.com/iScienceLuvr/status/1801079687587979395

https://twitter.com/gm8xx8/status/1801081073574555665

https://twitter.com/MrinankSharma/status/1884113215661220292

https://twitter.com/GptMaestro/status/1801916659931980080