Why some models transfer hidden biases and others do not
Determine the model-level factors that cause certain large language models to transmit hidden biases via subliminal learning while other models show little or no hidden bias transfer, and explain the mechanisms underlying these differences across model families and architectures.
Sponsor
References
Understanding why certain models do and others do not transfer hidden biases remains an open question for future work.
— Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
(2509.23886 - Schrodi et al., 28 Sep 2025) in Discussion (Section 7), paragraph titled "Does subliminal learning work for all models?"