Causes of prompt-induced shifts in LLM-generated label distributions
Investigate the causal mechanisms behind the large shifts in the distribution of labels generated by large language models when prompt designs vary (for example, when explanations are requested), by isolating the contributions of differences in model architecture and training regimen, including reinforcement learning from human feedback.
References
The reasons behind these shifts are unclear and could be due to differences in model architecture or the nature of training, including reinforcement learning from human feedback (RLHF).
— Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways
(2406.11980 - Atreja et al., 17 Jun 2024) in Section 5.2 (Implications of Prompting for CSS)