Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization

Published 1 May 2026 in cs.MA and cs.NE | (2605.00691v1)

Abstract: Distributed blackbox consensus optimization is a fundamental problem in multi-agent systems, where agents must improve a global objective using only local objective queries and limited neighbor communication. Existing methods largely rely on handcrafted update rules and static cooperation patterns, which often struggle to balance local adaptation, global coordination, and communication efficiency in heterogeneous nonconvex environments. In this paper, we take an initial step toward trajectory-driven self-design for distributed black-box consensus optimization. We first redesign the agent-level swarm dynamics with an adaptive internal mechanism tailored to decentralized consensus settings, improving the balance between exploration, convergence, and local escape. Built on top of this adaptive execution layer, we propose Learning to Act and Cooperate (LACMAS), a trajectorydriven framework in which LLMs provide sparse highlevel guidance for shaping both agentinternal action behaviors and agentexternal cooperation patterns from historical optimization trajectories. We further introduce a phased cognitive scheduling strategy to activate different forms of adaptation in a resource-aware manner. Experiments on standard distributed black-box benchmarks and real-world distributed tasks show that LAC-MAS consistently improves solution quality, convergence efficiency, and communication efficiency over strong baselines, suggesting a practical route from handcrafted distributed coordination toward self-designing multi-agent optimization systems.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces LAC-MAS, a framework that uses trajectory-driven adaptations and LLM guidance to tune internal actions and neighbor cooperation in distributed optimization.
The paper demonstrates that combining learning-to-act with learning-to-cooperate significantly improves consensus speed and objective quality across benchmarks and real-world tasks.
The paper validates that phased scheduling and adaptive communication efficiently balance local exploration with global consensus, ensuring robust performance under heterogeneous feedback.

LLM-Assisted Distributed Black-Box Consensus Optimization: Summary and Analysis

Motivation and Problem Statement

The paper "Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization" (2605.00691) addresses the fundamental challenge of multi-agent optimization under constraints of partial observability, limited communication, and non-convex, heterogeneous feedback. Conventional distributed black-box optimizers rely on handcrafted update rules and fixed cooperation patterns, resulting in rigid trade-offs between local adaptation, global consensus, and communication efficiency. The authors argue for learning-based mechanisms capable of trajectory-driven self-design, leveraging historical optimization experience to adapt both internal behaviors and external interactions, without centralized supervision.

The study focuses on the consensus optimization problem over fixed communication topologies, where each agent only has access to its local black-box objective and neighbor messages, aiming to minimize the global average of local objectives under the consensus constraint. This decentralized setup precludes explicit gradient information or global objective access, demanding innovative approaches toward distributed adaptation.

Framework Design: LAC-MAS Architecture

Adaptive Execution Layer

Each agent operates a population-based black-box optimizer, redesigned from classic particle swarm optimization (PSO) for decentralized settings. Agents maintain local particle populations whose dispersion informs the adaptive modulation of exploration, convergence, and escape dynamics. The internal action mechanism selects behavioral coefficients based on real-time divergence statistics, actively regulating swarm velocity updates and step-size randomness. An LLM further refines these behavioral modes based on historical trajectory data, allowing adaptive coefficients to be learned and refreshed in response to optimization progress, rather than fixed a priori.

Trajectory-Driven Cooperation

Beyond internal adaptation, agents learn to modulate the influence of each neighbor during consensus formation. Agents use trajectory descriptors—encoding neighbor fitness, population divergence, and state variation—to dynamically adjust neighbor weights via LLM inference. These weights are normalized to preserve consensus structure, while enhancing robustness and information utilization by prioritizing high-value neighbors and reducing the impact of stagnated or biased agents.

Phased Cognitive Guidance

A staged scheduling mechanism, Phased Cognitive Guidance (PCG), coordinates the refresh of internal action and external cooperation adaptations. Internal behavioral updates are triggered only at select milestones aligned with substantial regime changes, while cooperation guidance is refreshed more regularly to track evolving neighbor utility. This approach prevents excessive instability, reduces resource overhead, and aligns adaptation granularities with functional optimization phases: trajectory accumulation, local action learning, joint act-and-cooperate learning, and late-stage consensus stabilization.

Consensus Preservation

Under mild standard assumptions—fixed connected topology, normalized cooperation weights, bounded internal coefficients, and asymptotically vanishing perturbations—the closed-loop LAC-MAS dynamics admit consensus convergence guarantees. The system acts as a row-stochastic switched consensus process with finite internal regime switches and stage-wise guidance refresh. Theoretical analysis shows that the trajectory-driven adaptations do not compromise consensus guarantees or decentralized execution, and the process contracts disagreement asymptotically.

Experimental Evaluation and Results

Benchmark Suite

LAC-MAS is evaluated on ten standard distributed black-box optimization benchmarks (100 variables, 20 agents, strictly decentralized querying). Baselines include MASOIE (adaptive multi-agent swarm), GFPDO (population-based consensus), RGF (random gradient-free), and DAPSO (distributed PSO). Metrics include final solution quality, communication cost, and consensus disagreement.

Performance and Ablation

LAC-MAS achieves consistently superior or equivalent final fitness values and communication efficiency across most benchmarks. It produces statistically significant improvements in objective quality and consensus speed, especially for multimodal or heterogeneous landscapes where adaptation of internal and external strategies is critical. On functions with highly directional or specialized dynamics, LAC-MAS maintains robustness without detriment.

Ablation studies demonstrate complementary benefits of learning-to-act (internal adaptation) and learning-to-cooperate (neighbor weighting adaptation). Internal learning accelerates objective minimization and enhances local escape; cooperative learning improves consensus speed and information utilization. Full integration yields the most stable and efficient outcomes.

Transfer Validation

LAC-MAS generalizes to real-world tasks, exemplified by multi-target localization in wireless sensor networks—a distributed consensus scenario with partial observability and heterogeneous objectives. The framework achieves lower estimation error across all tested target numbers and maintains superior robustness as complexity increases, confirming the practical utility of trajectory-driven adaptation under constrained communication.

Implementation and LLM Integration

Agents deploy locally hosted LLMs (e.g., DeepSeek-R1 14B), interacting via lightweight prompts that encode local and neighbor trajectory statistics. LLM outputs are used for both internal behavioral mode selection and external cooperation weighting, but are refreshed sparsely and asynchronously per PCG. This approach preserves full decentralization, prevents oscillatory adaptations, and supports scalable execution without dependence on central resources or global states.

Implications and Future Directions

Practical

The LAC-MAS framework enables self-designing distributed optimizers that flexibly respond to heterogeneity and limited information, supporting applications in cooperative sensing, resource allocation, wireless communications, and distributed control. Communication efficiency, scalability, and robustness are enhanced through trajectory-driven guidance, potentially reducing infrastructural and computational costs in large-scale systems.

Theoretical

Trajectory-driven adaptation offers a viable alternative to handcrafted rule-based coordination, promising broader generalization beyond gradient-based, centralized, or static swarm approaches. The integration of LLMs as high-level guidance modules, rather than end-to-end solvers, illustrates a scalable pathway to algorithmic meta-optimization and auto-design in decentralized settings. Future work could explore extensions to dynamic topologies, asynchronous environments, or non-consensus objectives.

AI Research

The self-design paradigm advocated by LAC-MAS raises questions on meta-learning in distributed optimization, the robustness of LLM-guided adaptations under adversarial or non-stationary feedback, and the potential for emergent global behaviors from locally driven learning. Continued exploration may uncover new mechanisms for distributed intelligence, dynamic cooperation protocols, and performance guarantees in large multi-agent systems.

Conclusion

LAC-MAS systematically addresses distributed black-box consensus optimization by jointly learning agent-internal actions and agent-external cooperation via trajectory-driven LLM guidance and phased scheduling. The framework achieves improved solution quality, communication efficiency, and consensus stability over competitive baselines, and generalizes to realistic tasks. The study substantiates the role of meta-learning and language-driven guidance in decentralized optimization, establishing a foundation for more adaptive and scalable multi-agent systems.

Markdown Report Issue