Cause of amplified bias learning in the Coins task
Identify the mechanism responsible for finetuned large language models learning stronger coin-flip biases than the ground truth in the Coins task, and determine why the learned output probabilities overestimate the true coin biases.
References
We do not know why models learn stronger bias than the ground truth.
— Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
(2406.14546 - Treutlein et al., 20 Jun 2024) in Appendix — Coins task details, Training performance