Reduce FRQD-learning communication overhead without sacrificing convergence guarantees

Develop a communication-efficient variant of the Fully Resilient QD (FRQD)-learning algorithm that reduces the worst-case per-agent communication complexity from O(|N_i(t)|^2) per time step while preserving the almost sure convergence to the optimal value functions stated in Theorem 1 under F-total Byzantine edge attacks on (6F+1,0)-redundant communication graphs.

Background

The proposed FRQD-learning algorithm achieves almost sure convergence to the optimal value functions under F-total Byzantine edge attacks by using a two-hop redundancy-based filtering mechanism. This mechanism requires two communication rounds per time step: sending own Q-values to neighbors and relaying neighbor messages, which in the worst case imposes O(|N_i(t)|2) communication per agent.

The authors explicitly note that reducing this communication overhead while maintaining the same convergence guarantees remains unaddressed. Theorem 1 establishes the target guarantees—almost sure convergence to the optimal value functions and policies under (6F+1,0)-redundant graphs—so the open problem is to achieve equivalent guarantees with lower communication cost.

References

Reducing this overhead while preserving the same convergence guarantees (Theorem 1) remains a future work.

Fully Byzantine-Resilient Distributed Multi-Agent Q-Learning  (2604.02791 - Lee et al., 3 Apr 2026) in Remark following Algorithm 1, Subsection "Fully Resilient QD (FRQD)-Learning" (Section 3)