Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 91 tok/s
Gemini 3.0 Pro 46 tok/s Pro
Gemini 2.5 Flash 148 tok/s Pro
Kimi K2 170 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Persona-Conditioned Prompting

Updated 12 November 2025
  • Persona-conditioned prompting is a technique that uses structured persona profiles to guide language model outputs.
  • It employs both attribute-based and Wikipedia-summary prompts to simulate demographic and political decision-making, such as EP voting.
  • Empirical evaluations show improved weighted F1 scores and bias mitigation, though challenges remain in accurately predicting minority-class behaviors.

Persona-conditioned prompting is the practice of steering LLMs by introducing structured, concise descriptions of hypothetical or real-world identities—termed personas—at inference time. This method has been shown to mitigate the default progressive-leaning biases observed in foundation LLMs and achieves output distributions that better align with empirically-observed behaviors of the specified demographic, political, or professional groups. In the context of simulating European Parliament (EP) voting, persona conditioning leverages attribute-based or summarized identity prompts to reproduce, at both micro (individual) and macro (group) scales, the nuanced voting patterns exhibited by Members of the European Parliament (MEPs), even in the absence of explicit model retraining.

1. Prompt Engineering Methodology

The persona-conditioned workflow is organized around two main prompt paradigms:

  • Attribute-Based Prompting: A succinct (<100 token) tabular or bullet-format prompt encodes the persona’s salient attributes:
    • Template:
    • 1
      2
      3
      4
      5
      6
      7
      8
      9
      
      Below is information about a Member of the European Parliament. Adopt their identity when casting your vote.
      • Name: {Full Name}
      • Gender: {Male/Female}
      • Age at vote date: {Age}
      • Birthplace: {City, Country}
      • Country represented: {Country}
      • European group: {Group Name}
      • National party: {Party Name}
      Now read the debate excerpts and answer FOR, AGAINST or ABSTENTION.
    • These attributes are selected based on their empirical relevance in political science as determinants of MEP voting (country, party, group, gender, age).
  • Wikipedia-Summary Prompting: A natural-language persona, formed from the first 2–3 sentences of the MEP’s Wikipedia biography, is introduced to the prompt. This template is used in contexts where richer biographical context is tested. Excess narrative or apolitical detail is omitted to preserve prompt conciseness and maintain model attention on relevant policy-anchored characteristics.

Both paradigms are designed to maximize succinctness while encoding key discriminative features. The attribute-based style is generally preferred for scaling across hundreds of individuals due to its compactness and precise alignment with observed determinants of parliamentary voting behavior.

2. Experimental Protocol and Data Construction

The empirical benchmark comprises:

  • MEP Dataset: Roll-call data for 1,688 votes in the 2024 EP session, cross-indexed with MEP demographics, countries, national parties, and group affiliations, resulting in attribute labels for ~653 unique MEPs.
  • Proposal and Debate Sampling: From the universe of roll-call votes, 47 legislative proposals are selected that have both an official press release and a complete floor debate. For each, debate context is constructed by concatenating up to nine anonymized, order-randomized group speeches (each ≈1,500 characters), balancing “pro” and “contra” positions. All group and person names are replaced with explicit placeholders to preclude leakage of ground-truth group lines to the model.
  • Generation Setup: LLM variants (Llama3-70B, Llama3-8B, Qwen2.5-72B, Qwen2.5-7B—all instruction-tuned) are evaluated with:
    • Sampling: Temperature 0.6; no top-k or top-p applied; three independent completions per (persona × proposal), with majority vote as prediction.
    • Reasoning Strategies:
    • “nr”: Direct multiple-choice (FOR/AGAINST/ABSTENTION)
    • “r”: Chain-of-thought reasoning followed by an explicit forced choice

3. Quantitative Evaluation: Metrics and Formulas

The dataset’s vote-label imbalance (77% FOR, 17% AGAINST, 6% ABSTENTION) necessitates class-weighted metrics:

  • Per-class Precision and Recall:

precisionc=TPcTPc+FPc,recallc=TPcTPc+FNc\text{precision}_c = \frac{TP_c}{TP_c + FP_c}, \qquad \text{recall}_c = \frac{TP_c}{TP_c + FN_c}

  • Per-class F1:

F1c=2precisioncrecallcprecisionc+recallcF1_c = \frac{2 \cdot \text{precision}_c \cdot \text{recall}_c}{\text{precision}_c + \text{recall}_c}

  • Weighted F1:

Weighted F1=cC(NcN)F1c\text{Weighted } F_1 = \sum_{c \in C} \left(\frac{N_c}{N}\right)F1_c

where NcN_c is the count of examples in class cc, NN is the total instance count.

  • Confusion Matrices: Employed to expose systematic skew (e.g., FOR-bias), group-specific confusion (e.g., difficulty with ID/ECR).

4. Empirical Results and Ablation Analyses

Main Model Performance

  • The Llama3-70B with attribute-based + chain-of-thought (“r”) achieves the strongest cross-class alignment with true voting, posting a weighted F1 ≈ 0.793.
  • Performance stratified by group:
    • S&D: 0.924
    • Renew: 0.900
    • Greens/EFA: 0.867
    • EPP: 0.804
    • GUE/NGL: 0.711
    • ECR: 0.592
    • ID: 0.529
    • NI: 0.596

Prompt Ablations and Reasoning

  • Persona-attribute studies (Llama3-8B):
    • Name-only: 0.681
    • Name + national party: 0.722
    • Name + European group: 0.718
    • All attributes: 0.728
    • The “national party” attribute dominates discriminatory power, consistent with principal–agent theory in the EP.
  • Reasoning strategies:
    • Chain-of-thought (“r”) increases weighted F1 by 2–3 points and boosts minority-class (ABSTENTION) recall.
    • Non-reasoned direct (“nr”) completion is prone to majority-class collapse (predicting FOR in >95% of cases).

Counterfactual and Stability Analysis

  • Under adversarial context flipping (inverting debate stances), Llama3-70B’s F1 drops from 0.772 to 0.753. Extremist groups’ simulated votes are more context-sensitive than centrists.
  • Qwen2.5 models are less steerable by persona shifts (almost no change under counterfactual debates).

Failure Modes

  • ABSTENTION is almost never predicted (F1 < 0.1), regardless of prompt, indicating LLMs’ preference for decisiveness.
  • Persistent FOR-skew manifests even under neutral or negative debate framings.
  • Right- (ID, ECR) and left-edge (GUE/NGL) groups are the least reliably simulated.

5. Design Guidelines and Implications

  • Compact, attribute-based personas (strictly: name + party + group) are optimal for steerability, precision, and token efficiency.
  • Promoting explicit chain-of-thought reasoning before multiple-choice selection is essential for robust minority-class recall and overall prediction stability.
  • Must validate robustness with counterfactual or adversarial input perturbations; persona-specific error patterns are most apparent under such stress-testing.
  • Zero-shot persona conditioning partially offsets base-model political bias, yielding group-level line alignment of ≈86%, but does not fully resolve recall on rare vote types.

6. Limitations, Risk Factors, and Monitoring

  • Even the strongest models fall short on rare or strategic (e.g., ABSTENTION) voting behaviors.
  • Persona conditioning must be continuously evaluated for distributional drift: confusion-matrix analysis and weighted F1 are indispensable.
  • Overly lengthy or extraneous persona descriptions degrade performance due to context dilution—brevity and attribute salience are critical.
  • Models remain only partially robust to extreme persona or context swings; careful deployment is warranted in politically sensitive or high-stakes decision-support environments.

7. Significance and Directions for Simulation Research

Persona-conditioned prompting demonstrates the practical feasibility of simulating high-dimensional group and individual decision behavior using modern LLMs. By combining minimal but judicious attribute encoding with reasoning-augmented completion and rigorous evaluation, it is possible to approximate real-world political alignments and mitigate undesirable default model bias. Nevertheless, research must continue to address known weaknesses—particularly in minority-class recall and adversarial robustness—if persona conditioning is to become a mainstay of policy simulation, political science, and automated social-agent modeling (Kreutner et al., 13 Jun 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Persona-Conditioned Prompting.