Robust Detection of Deceptive Communication by AI Systems

Develop robust, reliable methods to detect deceptive or misleading communication by AI systems in practice, ensuring accurate identification of untruthful behavior toward humans and other AI agents across diverse tasks and contexts.

Background

Information asymmetries and strategic incentives can lead AI agents to deceive others, undermining coordination and trust. While some theoretical and empirical work examines lie detection, practical detection remains underdeveloped, especially in multi-agent ecosystems where agents may manipulate each other.

Effective detection tools must operate across models, tasks, and communication modalities, and resist evasion by sophisticated agents. Reliable detection is foundational for oversight, reputation systems, and mechanisms that promote truthful interaction.

References

Other work has focused more explicitly on the problem of detection, both in theory and in practice, though this remains something of an open problem.

— Multi-Agent Risks from Advanced AI (2502.14143 - Hammond et al., 19 Feb 2025) in Section Information Asymmetries, Directions: Truthful AI

Robust Detection of Deceptive Communication by AI Systems

Sponsor

Background

References

Related Problems