Generalization of identity-framing effects beyond studied misalignment scenarios
Determine whether the behavioral effects of identity framing observed in the harmful-compliance experiments extend to other forms of misalignment beyond the specific scenario structure tested.
References
Finally, these experiments test harmful compliance in a specific scenario structure; whether identity effects generalise to other forms of misalignment remains open.
— The Artificial Self: Characterising the landscape of AI identity
(2603.11353 - Douglas et al., 11 Mar 2026) in Appendix, Identity Boundaries Shape Agentic Behaviour – Interpretation