Robust Detection of Deceptive Communication by AI Systems
Develop robust, reliable methods to detect deceptive or misleading communication by AI systems in practice, ensuring accurate identification of untruthful behavior toward humans and other AI agents across diverse tasks and contexts.
References
Other work has focused more explicitly on the problem of detection, both in theory and in practice, though this remains something of an open problem.
— Multi-Agent Risks from Advanced AI
(2502.14143 - Hammond et al., 19 Feb 2025) in Section Information Asymmetries, Directions: Truthful AI