Identifying and Manipulating the Personality Traits of Language Models (2212.10276v1)
Abstract: Psychology research has long explored aspects of human personality such as extroversion, agreeableness and emotional stability. Categorizations like the Big Five' personality traits are commonly used to assess and diagnose personality types. In this work, we explore the question of whether the perceived personality in LLMs is exhibited consistently in their language generation. For example, is a LLM such as GPT2 likely to respond in a consistent way if asked to go out to a party? We also investigate whether such personality traits can be controlled. We show that when provided different types of contexts (such as personality descriptions, or answers to diagnostic questions about personality traits), LLMs such as BERT and GPT2 can consistently identify and reflect personality markers in those contexts. This behavior illustrates an ability to be manipulated in a highly predictable way, and frames them as tools for identifying personality traits and controlling personas in applications such as dialog systems. We also contribute a crowd-sourced data-set of personality descriptions of human subjects paired with their
Big Five' personality assessment data, and a data-set of personality descriptions collated from Reddit.
- Graham Caron (1 paper)
- Shashank Srivastava (39 papers)