Papers
Topics
Authors
Recent
Search
2000 character limit reached

Robust Coordination under Misaligned Communication via Power Regularization

Published 9 Apr 2024 in cs.MA | (2404.06387v2)

Abstract: Effective communication in Multi-Agent Reinforcement Learning (MARL) can significantly enhance coordination and collaborative performance in complex and partially observable environments. However, reliance on communication can also introduce vulnerabilities when agents are misaligned, potentially leading to adversarial interactions that exploit implicit assumptions of cooperative intent. Prior work has addressed adversarial behavior through power regularization through controlling the influence one agent exerts over another, but has largely overlooked the role of communication in these dynamics. This paper introduces Communicative Power Regularization (CPR), extending power regularization specifically to communication channels. By explicitly quantifying and constraining agents' communicative influence during training, CPR actively mitigates vulnerabilities arising from misaligned or adversarial communications. Evaluations across benchmark environments Red-Door-Blue-Door, Predator-Prey, and Grid Coverage demonstrate that our approach significantly enhances robustness to adversarial communication while preserving cooperative performance, offering a practical framework for secure and resilient cooperative MARL systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. The emergence of adversarial communication in multi-agent reinforcement learning. In Conference on Robot Learning, pp.  1394–1414. PMLR, 2021.
  2. Multi-agent adversarial attacks for multi-channel communications. arXiv preprint arXiv:2201.09149, 2022.
  3. Liir: Learning individual intrinsic reward in multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019.
  4. Adversarial attacks in consensus-based multi-agent reinforcement learning. In 2021 American Control Conference (ACC), pp.  3050–3055. IEEE, 2021.
  5. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, 2018.
  6. Towards comprehensive testing on the robustness of cooperative multi-agent reinforcement learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  115–122, 2022.
  7. Sparse adversarial attack in multi-agent reinforcement learning. arXiv preprint arXiv:2205.09362, 2022.
  8. Reward redistribution mechanisms in multi-agent reinforcement learning. In Adaptive Learning Agents Workshop at the International Conference on Autonomous Agents and Multiagent Systems, 2020.
  9. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  3040–3049. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/jaques19a.html.
  10. Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences. arXiv preprint arXiv:2010.09054, 2020.
  11. The benefits of power regularization in cooperative reinforcement learning. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pp.  457–465, 2023.
  12. Attacking cooperative multi-agent reinforcement learning by adversarial minority influence, 2023.
  13. On the robustness of cooperative multi-agent reinforcement learning. In 2020 IEEE Security and Privacy Workshops (SPW), pp.  62–68, 2020. doi: 10.1109/SPW50608.2020.00027.
  14. Learning to ground multi-agent communication with autoencoders. Advances in Neural Information Processing Systems, 34:15230–15242, 2021.
  15. A review of cooperative multi-agent deep reinforcement learning. Applied Intelligence, 53(11):13677–13722, 2023.
  16. A theory of mind approach as test-time mitigation against emergent adversarial communication. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pp.  2842–2844, 2023.
  17. Learning when to communicate at scale in multiagent cooperative and competitive tasks. arXiv preprint arXiv:1812.09755, 2018.
  18. Learning multiagent communication with backpropagation. Advances in neural information processing systems, 29, 2016.
  19. Adversarial attacks on multi-agent communication. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  7768–7777, 2021.
  20. Deconstructing cooperation and ostracism via multi-agent reinforcement learning, 2023.
  21. Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In International Conference on Machine Learning, pp.  12491–12500. PMLR, 2021.
Citations (1)

Summary

  • The paper introduces power regularization to explicitly quantify and penalize the influence of communication on agent policies.
  • It demonstrates that agents trained with this method retain task performance even under adversarial or absent communication in benchmarks like Red-Door-Blue-Door and Predator-Prey.
  • The results imply that tuning the regularization coefficient allows for a balance between cooperative performance and resilient, autonomous decision-making.

Robust Coordination under Misaligned Communication via Power Regularization

Introduction

The paper addresses a significant vulnerability in Cooperative Multi-Agent Reinforcement Learning (CoMARL): the dependence of agent policies on communication channels, which can be exploited under objective misalignment. By reframing the issue of inter-agent communication through the lens of power—the influence one agent asserts over another's decision-making—the authors introduce power regularization as a principled constraint. This regularization directly mitigates agents' susceptibility to performance impairments caused by adversarial or otherwise misaligned communication. The proposed framework quantifies and modulates the power dynamics embedded in communicative exchanges, enhancing the autonomy and robustness of MARL agents facing adversarial or mixed-motivation settings.

Background

Traditional CoMARL algorithms leverage communication for efficient coordination and robust policy learning, exemplified by architectures such as CommNet and IC3Net. However, emergent behaviors in these frameworks include free-riding, irrelevant convention formation, and excessive trust in communicated messages—phenomena that open new avenues for exploitation by misaligned agents. Prior work on adversarial communication in MARL [e.g., (Li et al., 2023)] has established that performance can severely degrade under targeted attacks; yet defenses mostly focus on adversarial training, consensus robustness, or reward redistribution, lacking explicit mechanisms to modulate communicative dependencies.

Power, as defined in the multi-agent context, encapsulates the expected impact an agent's actions have over another's utility. The recent formalization of power regularization [Li & Dennis, 2023] proposes explicitly penalizing agent states/actions with high opponent-induced utility variance, thus fostering more robust learning in adversarial settings. This work extends that formalism to the explicit regularization of power exerted over communication channels, partitioning agent influence into communicative and implicit (non-communicative) modalities.

Power Regularization Over Communication

The core proposal introduces a modified power metric, decomposed into communicative and implicit terms. For agent ii influenced by agent jj through both actions and messages, the Q-value is redefined as

Qi(s,a)=Qiπ(s,a)+λ[Qiπ,ρi:jimplicit(s,a)+Qiπ,ρi:jcomm(s,a)],Q_i(s,a) = Q_i^\pi(s,a) + \lambda \left[ Q_i^{\pi, \rho_{i:j}^{\text{implicit}}}(s,a) + Q_i^{\pi, \rho_{i:j}^{\text{comm}}}(s,a) \right],

where λ\lambda is a tunable regularization coefficient. The communicative power term models influence exerted via the communication channel, penalizing reliance arising specifically from messages. This regularization can be interpreted as state-based reward shaping, where increased agent autonomy is explicitly targeted by discouraging over-dependence on communicative coordination.

Testing is performed in both cooperative (aligned objectives) and competitive/misaligned settings, with the regularization strength λ\lambda modulating the degree to which agents hedge against communicative adversarial risk.

Experimental Results

Experimental validation is conducted in two canonical MARL domains: Red-Door-Blue-Door and Predator-Prey. The evaluation benchmarks the impact of power regularization on resilience to adversarial communication and the resulting behavioral adaptations.

Upon introducing power regularization, cooperative baselines (e.g., MAPPO, IC3Net) demonstrated pronounced vulnerability to adversarial or omitted communication—exhibiting dramatic performance drops or outright task failure in both domains. However, agents trained with communication-power regularization retained high task success rates under adversarial messaging and were capable of completing tasks even when communication was absent during test time. Figure 1

Figure 1: The Red-Door-Blue-Door environment used to analyze the impact of misaligned communication and power regularization.

In Red-Door-Blue-Door, policies learned without power regularization were rash and over-eager, immediately acting based on trusted communication, thus exposing themselves to sabotage when adversarial communication was present. With power-regularized training, agents adopted more cautious, communication-agnostic strategies which remained robust even as adversarial signals surfaced. The critical metric here was episode length and reward variance: power regularization led to longer (i.e., more deliberate) episodes and more stable reward distributions across training and adversarial test conditions.

Similarly, in Predator-Prey, classic IC3Net policies failed entirely when communication was corrupted or disabled post-training, while power-regularized policies maintained perfect task completion regardless of the communication regime, demonstrating successful transfer of autonomy and robustness across cooperative and adversarial modalities.

Theoretical and Practical Implications

This methodology provides a systematic approach to mitigating power-vulnerability in MARL, particularly in environments where agent objectives are only partially aligned or subject to adversarial interference. The explicit quantification and penalization of communicative influence—distinguished from implicit power—enables fine-grained control over agent dependency structures. This is especially relevant in high-stakes, safety-critical domains, where overtrust in communication can have catastrophic consequences.

Moreover, the regularization approach smoothly interpolates between pure cooperation and robust autonomy, controlled via the λ\lambda hyperparameter, enabling domain designers to tune behavioral trade-offs between performance and resilience. This has implications for communication-efficient MARL, composable agent design, and adversarial game theory, as well as for the understanding of emergent conventions and coalitional power in multi-agent populations.

Future Directions

The framework invites several extensions. Paramount among these is the formalization and systematic study of power aggregation mechanisms for settings with multiple, possibly colluding, adversaries, as well as the adaptation to kk-step or temporally extended adversarial strategies. Exploration of parameterized or adaptive regularization coefficients could endow agents with dynamic resilience, adjusting their communicative trust in response to online estimation of misalignment or adversarial presence. Finally, integrating this formalism with differentiable/learned communication architectures could further automate the emergence of robust, interpretable conventions in large-scale agent societies.

Conclusion

Power regularization over communication represents a formal, tractable means of enforcing autonomy and robustness in cooperative MARL. By distinguishing and penalizing excess influence exerted through communicative channels, this approach minimizes the risk of exploitation in environments with misaligned agents, while preserving the performance benefits of communication in the cooperative regime. Experimental evidence in standard benchmarks validates the effectiveness of the method, highlighting a principled path forward for resilient MARL system design.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.