Extent to which LLMs can reliably embody assigned personas

Determine the extent to which large language models can reliably embody assigned personas when instructed via prompting to adopt specific human cultural perspectives, assessing whether persona modulation yields coherent and humanlike cultural stances across tasks and model architectures.

Background

Prompting (persona modulation) is widely used to steer LLMs toward specific cultural viewpoints, with methods such as anthropological prompting aiming to induce culturally coherent responses. However, reinforcement learning from human feedback typically optimizes for annotator approval rather than accurate persona adoption, raising doubts about reliable persona embodiment.

Prior studies report mixed results: some find that cultural prompting improves alignment for certain countries while failing or worsening bias for others, and that different perspective induction techniques produce inconsistent outcomes across tasks and architectures. This paper further demonstrates that even optimized prompting yields erratic, un-humanlike response patterns, underscoring the need to rigorously establish whether and when LLMs can faithfully adopt specified personas.

References

Despite past efforts, it is unclear the extent to which LLMs can reliably embody the assigned persona.

— Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs (2503.08688 - Khan et al., 11 Mar 2025) in Identifying Key Assumptions, Subsection "Steerability"

Extent to which LLMs can reliably embody assigned personas

Background

References

Related Problems