Potential non-human strategic preferences in Solar and Mistral
Investigate whether Solar 10.7B and Mistral 7B exhibit distinctly non-human strategic preferences in contexts beyond those studied and characterize the conditions under which such divergences arise.
References
It is probable, though not established, that in some circumstances these models may have distinctly non-human strategic preferences.
                — Do Large Language Models Learn Human-Like Strategic Preferences?
                
                (2404.08710 - Roberts et al., 11 Apr 2024) in Section 7.1, Limitations