Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour (2505.17801v1)

Published 23 May 2025 in cs.AI

Abstract: Autonomous multi-agent systems (MAS) are useful for automating complex tasks but raise trust concerns due to risks like miscoordination and goal misalignment. Explainability is vital for trust calibration, but explainable reinforcement learning for MAS faces challenges in state/action space complexity, stakeholder needs, and evaluation. Using the counterfactual theory of causation and LLMs' summarisation capabilities, we propose Agentic eXplanations via Interrogative Simulation (AXIS). AXIS generates intelligible causal explanations for pre-trained multi-agent policies by having an LLM interrogate an environment simulator using queries like 'whatif' and 'remove' to observe and synthesise counterfactual information over multiple rounds. We evaluate AXIS on autonomous driving across 10 scenarios for 5 LLMs with a novel evaluation methodology combining subjective preference, correctness, and goal/action prediction metrics, and an external LLM as evaluator. Compared to baselines, AXIS improves perceived explanation correctness by at least 7.7% across all models and goal prediction accuracy by 23% for 4 models, with improved or comparable action prediction accuracy, achieving the highest scores overall.

Authors (4)

Bálint Gyevnár (1 paper)
Christopher G. Lucas (25 papers)
Stefano V. Albrecht (73 papers)
Shay B. Cohen (78 papers)

Summary

Integration of Counterfactual Simulations with LLMs for Explainability in Multi-Agent Systems

This paper presents a novel approach to improving the explainability of autonomous Multi-Agent Systems (MAS) using a framework called Agentic eXplanations via Interrogative Simulation (AXIS). The primary focus is on addressing the challenges posed by MAS, particularly concerns surrounding trust due to miscoordination or goal misalignment, by leveraging the capabilities of LLMs together with counterfactual simulations.

Overview of AXIS Framework

The AXIS framework integrates counterfactual reasoning with the summarization capabilities of LLMs to generate explanations that help stakeholders understand MAS behavior. The framework allows for the interrogation of an environment simulator through queries, such as 'whatif' and 'remove', enabling the synthesis of causal explanations based on observed and counterfactual information. This interrogation is conducted in multiple rounds, allowing the LLM to refine the explanations iteratively.

Evaluation and Results

The paper evaluates AXIS using autonomous driving scenarios, an area where safety and trust are particularly critical. Through this evaluation, AXIS is shown to improve the perceived correctness of explanations by at least 7.7% across all tested models and increases goal prediction accuracy by 23% for four models compared to baseline approaches. This demonstrates AXIS's effectiveness in providing more intelligible and actionable insights into MAS behavior, aligning closer to human expectations.

Furthermore, the evaluation methodology combines subjective measures (such as user preference and perceived correctness) with objective metrics (like goal and action prediction accuracy), offering a comprehensive assessment of explanation effectiveness. Claude 3.5, a sophisticated LLM, is used as an external evaluator, simulating expert judgment on the explanations' quality.

Implications and Future Directions

The results suggest that AXIS provides a promising direction for enhancing transparency and trust in MAS. By using counterfactual simulations, the framework offers explanations that align more closely with human reasoning, addressing one of the significant hurdles in explainable reinforcement learning. This work indicates potential applications in various domains where MAS are used, such as finance or social media, by tailoring the system to different types of agents and environments.

Future research could focus on extending AXIS to other areas in AI governance, exploring its applicability to broader classes of MAS beyond autonomous driving. Additionally, there is potential for further optimization by refining the model's interrogation mechanisms or improving context feature selection to boost explanation precision. Integration with real-world MAS deployments would provide more empirical data, refining the framework's capabilities and increasing its robustness against the complexity inherent in diverse, real-world environments.

Conclusion

The AXIS framework represents a significant step towards more transparent and accurate explanations in multi-agent systems using LLMs. Its integration of counterfactual simulations ensures that AI-driven decisions are presented in a way that is meaningful and actionable for stakeholders, enhancing their ability to trust and interact with these systems effectively. Through rigorous evaluation, the paper demonstrates that AXIS is a viable approach that could be key in translating complex agent strategies into comprehensible insights, marking a valuable contribution to the field of explainable AI.

Related Papers

Find Related Papers

YouTube

Show All Videos