CoMAS: Multi-Domain Frameworks
- CoMAS is a suite of frameworks spanning multi-agent reinforcement learning, 3D human motion synthesis, wireless communications, and massive random access, each addressing unique challenges with advanced algorithms.
- In multi-agent settings, it leverages intrinsic interaction rewards and decentralized updates to enable self-evolving LLM-based agents with demonstrable performance gains.
- In wireless applications, CoMAS employs constructive precoding and coding techniques to reduce receiver complexity and achieve power savings up to 6 dB with improved SER.
CoMAS refers to multiple distinct frameworks and methodologies across research domains, each denoted by a variant of the acronym. These include “Co-Evolving Multi-Agent Systems via Interaction Rewards” (multi-agent RL), “Compositional Human Motion Generation with Multi-modal Agents” (3D motion synthesis), “Constructive Multiple Access” (wireless communications), and “Coded Orthogonal Modulation Multiple-Access” (multiple access channel coding). Each framework addresses domain-specific challenges through advanced algorithmic and architectural innovations.
1. Co-Evolving Multi-Agent Systems via Interaction Rewards (CoMAS)
CoMAS, as introduced in (Xue et al., 9 Oct 2025), is an architecture for self-evolving LLM-based agents grounded in fully intrinsic interaction-based rewards. The key innovation is to enable co-evolution without recourse to extrinsic reward models or data, paralleling the human paradigm of improvement via mutual discussion and critique.
Core Architecture
- Agent Pool: A set of LLM-based agents , each with its own parameterization .
- Interaction Phase: For each task, multiple rounds comprise a solver proposing a solution, several evaluators critiquing it, and a judge scoring the interaction using a fixed format (integer in ).
- Intrinsic Reward Formulation: Rewards are zero-sum between proposer and evaluator, parsed directly from judge outputs, and normalized between .
- Learning: Each agent’s buffer of experiences is used for advantage-based policy optimization with a clipped PPO-like surrogate objective.
Key Mathematical Elements
- Advantage Assignment:
- Objective:
- Decentralized Updates: All agents optimize independently and can be heterogeneous.
Experimental Findings
- Superior accuracy or pass@1 metrics across GSM8K, HumanEval, SciBench, MMLU compared to untrained LLMs, MAPoRL (external verifier-based), and TTRL.
- Multi-agent debate/consistency setups yield performance gains with increased agent count and model heterogeneity.
- Reward ablation demonstrates necessity of full interaction/evaluation/scoring triad.
Significance
CoMAS establishes a scalable, fully self-supervised paradigm for improving LLM-based agents in multi-agent settings without external reward models, demonstrating monotonic performance gains with increased diversity and number of agents (Xue et al., 9 Oct 2025).
2. Compositional Human Motion Generation with Multi-modal Agents (CoMA/CoMAS)
The “CoMAS” framework for 3D motion generation (Sun et al., 2024) leverages collaborative agents powered by LLMs and vision-LLMs (VLMs) for motion synthesis, editing, and comprehension.
Framework Components
- Task Planner (GPT-4o): Parses long, compositional motion prompts into base and local edit instructions grounded in a motion dataset vocabulary.
- SPAM Motion Generator: A masked, spatially-aware motion transformer with body-part VQ-VAE encoders for decomposing human motion into four body-part streams. Token-based codebooks allow fine-grained control.
- Motion Reviewer (VideoChat2 + GPT-4o): Renders and captions generated motion, compares with the original prompt, and issues corrective edit instructions.
- Trajectory Editor: Applies explicit, LLM-generated trajectory constraints.
- Hierarchical Generation: Breaks down long prompts into temporal segments, composes base motion and edits, and blends results.
Key Algorithms
- Spatial Masked Transformer applies a loss computed only on masked tokens for base-level generation, with residual prediction for hierarchical codebook layers. Space–time self-attention ensures fidelity to both spatial (body-part) and temporal constraints.
- Text-guided Editing: Arbitrary frame/body-part masks allow iterative user- or reviewer-driven edits without retraining.
Evaluation
- On HumanML3D, SPAM achieves -Precision@1=0.526 and low FID/MultiModalDist, outperforming or matching baselines on compositional, spatially-detailed prompts.
- User studies: CoMA yields mean ratings of 4.2/5 and top Motion Alignment Scores under context-rich, multi-stage prompts, exceeding prior masked-transformer, diffusion, and motion-LLM approaches.
Significance
CoMA/CoMAS enables efficient production, comprehension, and editing of complex, composite motions with strong prompt alignment, demonstrating robust spatial decomposition and a closed self-corrective loop for state-of-the-art motion generation (Sun et al., 2024).
3. Constructive Multiple Access (CoMA) in Wireless Communications
In multiple access communications, CoMA (Salem et al., 2022) rethinks the conventional NOMA design by exploiting constructive interference alignment, thereby eliminating the need for successive interference cancellation (SIC) at the receiver.
System Model
- MISO Downlink NOMA Pair: BS, antennas; users indexed 1 (“strong”) and 2 (“weak”); perfect BS-side CSI.
- Constructive Precoding: BS designs beamformers so that at each user, the aggregate received symbol (desired + interferer) lands within the intended decision region of the user’s constellation.
Optimization Strategies
- Power Minimization under QoS: QCQP with linearized constraints ensures QoS SINR for both users, enforcing constructive interference.
- SER Minimization under Power Constraint: Block coordinate ascent algorithm with auxiliary variables maximizes minimum user SNR under total power, with “constructive interference” constraints.
- Receiver Complexity Analysis: CoMA eliminates SIC and reduces the required ML detection operations—yielding a per-symbol complexity per user of , compared to in NOMA (where is the constellation order).
Results
- Power savings up to 6 dB (2-antenna BS) and 4–8 dB in larger arrays compared to NOMA/OMA.
- SER improvements (e.g., at 2 antennas and 10 dB, compared to for NOMA).
- 40–60% reduction in per-frame receiver operations.
Practical Applicability
CoMA is well-suited to scenarios prioritizing low device complexity and latency, such as massive-IoT or URLLC in sub-6 GHz 5G deployments, but currently best adapted to PSK signaling (Salem et al., 2022).
4. Coded Orthogonal Modulation Multiple-Access (CoMAS) for Massive Random Access
The CoMAS scheme for the multiple-antenna MAC (Fengler et al., 2023) is a low-complexity approach combining outer -ary coding, orthogonal waveform modulation, and multi-antenna approximate message passing for massive unsourced or traditional random access.
System and Code Design
- Transmitter: single-antenna users encode information bits into length- -ary codewords, modulated onto orthonormal waveforms (e.g., time-frequency chirps), achieving per-user orthogonality in each slot.
- Channel Model: -antenna BS observes the superposition of all active user codewords.
- Decoder:
- MMV-AMP front end delivers, per symbol slot, a candidate set for each user symbol.
- The outer “A-channel” code corrects residual uncertainty (false alarms, erasures) using a PUPE-optimized code.
Key Performance and Complexity Results
- Achieves nearly double the sum-spectral efficiency (for ) compared to MMSE/neareast-neighbor with Gaussian signaling.
- FFT-based implementation enables robust and efficient demodulation for chirped waveforms.
- Receiver complexity is per iteration, with practical numbers of AMP iterations ($10$–$20$).
- Outer codes can be realized via sparse tree codes or BP-decoding on factor graphs with complexity.
Scaling Laws
- Minimum required for reliable decoding with low PUPE, reflecting a direct trade-off between spatial resources, modulation order, and user load.
- Retains substantial performance advantages even under imperfect CSI (MMSE-based channel estimation).
5. Contextual Massing Generation (CoMa) in Architecture
CoMa (Maslov et al., 13 Jan 2026), while visually similar in acronym, refers to an automated architectural massing generation framework leveraging vision-LLMs and the CoMa-20K dataset. It addresses a distinct problem—functional programmatic geometry synthesis conditioned on multimodal urban context—and is not directly related to the multi-agent, communication, or MAC settings described above.
6. Terminological Note and Distinction Table
The abbreviation “CoMAS” or similar terms have appeared in a variety of unrelated contexts, requiring care when referencing in scholarly work. The following table summarizes the domains:
| Variant | Domain | Core Mechanism | Reference |
|---|---|---|---|
| CoMAS | LLM Multi-Agent Self-Evolution | Intrinsic RL, agent debate/eval | (Xue et al., 9 Oct 2025) |
| CoMA/CoMAS | 3D Motion Generation | Agent-based, masked transformer | (Sun et al., 2024) |
| CoMA | MISO Wireless Comms | Constructive interference (NOMA) | (Salem et al., 2022) |
| CoMAS | Massive Random Access | Coded orthogonal modulation, MMV-AMP | (Fengler et al., 2023) |
| CoMa | Architectural Massing | VLM, multimodal context, CoMa-20K | (Maslov et al., 13 Jan 2026) |
7. Future Directions and Open Challenges
- In multi-agent LLM co-evolution, open questions include robustness against collusion, scaling to hundreds of agents, and dynamic role assignment.
- For 3D human motion, improving dataset diversity and integrating deeper geometric reasoning may further enhance compositional fidelity.
- In constructive and orthogonal multiple access, extending to broader modulation schemes and further reducing channel estimation demands remain areas of active investigation.
- Cross-domain, a persistent challenge is the clear and precise use of the “CoMAS” acronym; explicit contextualization is critical.