MARCO: Modular Multi-Agent Frameworks

Updated 3 July 2026

MARCO is a collection of modular, multi-agent frameworks that span machine learning, NLP, information retrieval, optimization, and hardware design.
The systems demonstrate practical benefits, achieving runtime reductions up to 30.9% in code optimization and accuracy gains over monolithic baselines in task orchestration.
MARCO’s design enables efficient multilingual IR, combinatorial optimization with memory-augmented reinforcement learning, and robust detoxification via context-aware reasoning.

MARCO

MARCO refers to a diverse set of technical frameworks, algorithms, and datasets across multiple domains in machine learning, natural language processing, information retrieval, combinatorial optimization, hardware design, multi-agent systems, and practical tool orchestration. The acronym MARCO—standing for distinct expansions depending on context (such as Multi-Agent Real-time Chat Orchestration, Multi-Agent Reactive Code Optimizer, Mask and Replace with Context, and others)—is unified by the use of modular, compositional, or multi-agent techniques, often with reinforcement learning or optimization under strong domain constraints. The following sections summarize major MARCO systems and their foundational principles as documented in peer-reviewed and preprint literature.

1. MARCO in Multilingual and Low-Resource Information Retrieval

Recent advances recognize the structural limitations of English-centric IR benchmarks, driving the extension of large-scale evaluation to languages such as Urdu. "Enabling Low-Resource Language Retrieval: Establishing Baselines for Urdu MS MARCO" introduces the first Urdu IR dataset by fully translating the MS MARCO passage ranking corpus (8.8 million passages, ~500 K queries) using the IndicTrans2 distilled model (200 M parameters) (Butt et al., 2024). The resulting dataset directly mirrors the original’s format, enabling one-to-one benchmark comparisons.

For modeling, baseline zero-shot results are obtained by direct application of the mMARCO Reranker, a multilingual mT5-based cross-encoder pretrained on diverse scripts (including Urdu’s Perso-Arabic). Fine-tuning on the Urdu-translated corpus (Urdu-mT5-mMARCO) further sharpens retrieval performance, achieving MRR@10 = 0.247 and Recall@10 = 0.439 on the dev set. These constitute nontrivial gains over BM25 and zero-shot cross-lingual transfer (MRR@10 = 0.204, Recall@10 = 0.408), demonstrating that moderate-size synthetic resources—combined with strong multilingual pretraining—can substantially bridge quality gaps for low-resource scripts.

Challenges identified include script directionality, morphological complexity, and translation artifacts potentially distorting supervised signals. Nonetheless, the work validates that language-specific fine-tuning on machine-translated data closes much of the gap relative to English, and sets a reproducible baseline for inclusion of other South Asian languages via adaptable synthetic pipelines (Butt et al., 2024).

2. Multi-Agent Architectures and Real-Time Orchestration

MARCO systems frequently employ multi-agent architectures to decompose complex problem domains into specialized submodules, coordinating via formalized communication and feedback. Notable frameworks include:

a. Code Optimization for HPC

MARCO (Multi-Agent Reactive Code Optimizer) operationalizes a two-agent iterative loop for automatic code optimization in high-performance computing (HPC) (Rahman et al., 6 May 2025). One agent specializes in code generation, integrating external optimization strategies from a real-time web-search component (retrieving recent arXiv, ACM, IEEE recommendations), while the second agent benchmarks performance (runtime, memory, FLOPS) under controlled hardware.

The algorithmic structure is summarized as a Markov-style iterative process. Let S^O_t = (C_{t–1}, M_{t–1}, K_t) denote optimizer state at iteration t. The optimizer proposes C_t based on previous code and performance metrics plus extracted knowledge K_t; evaluator computes new metrics M_t. Iterations continue until metric improvements fall below a threshold ε or a maximum number of steps T_max is reached. Integration of web-search for dynamic code hints yields a 14.6% runtime reduction over strong LLM baselines, and a further 30.9% improvement when the search module is active, demonstrating that static LLMs cannot match the agility of multi-agent, knowledge-augmented systems for code optimization (Rahman et al., 6 May 2025).

b. Real-Time LLM-Assisted Task Orchestration

MARCO (Multi-Agent Real-time Chat Orchestration) provides an orchestration platform for LLM-based assistants in multi-turn, tool-enabled automation (Shrimal et al., 2024). The framework isolates user intents, selects and executes domain-specific Task Execution Procedures (TEPs), and orchestrates a hierarchy of Task Agents guided by deterministic tool schemas. Guardrails are embedded at every layer, enforcing output format correctness, disallowing hallucinated function calls and parameters, and validating domain constraints. Reflection-driven retries adaptively reduce errors, yielding >28% raw accuracy gain and cost reduction relative to monolithic baselines.

Agent communication is structured through XML/JSON-interpretable exchanges, and orchestration modules are modular—intent classification, RAG retrieval, agent hierarchy, and guardrails are independently configurable (Shrimal et al., 2024). Empirical results on enterprise datasets demonstrate high task-execution accuracy (~94.5%/92.7%), low latency (~5.6 s), and ~33% cost savings.

c. Hardware Design with Configurable Task Graphs

MARCO frameworks have also been applied to hardware design, leveraging a configurable graph-based task solving paradigm (Ho et al., 25 Feb 2025). Here, hardware problem decomposition is modeled as a directed graph, each node a sub-task (e.g., netlist analysis, timing report extraction), orchestrated by a combination of LLM agents equipped with domain-specific toolkits (SPICE, STA, DRC checkers). MARCO supports both dynamic and static task graphs, supporting hierarchical delegation and memory consolidation across agent groups. Reported metrics include up to 19.4% cell area reduction, 23.5% more LVS/DRC-clean layouts, and ~60× speedup on timing analysis.

3. MARCO Variants for Optimization and Reinforcement Learning

a. Combinatorial Optimization with Memory-Augmented RL

In "MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial Optimization," the framework augments NCO search with explicit memory. At each step, the policy retrieves and aggregates nearest-neighbor solutions or actions from a global memory, forming a memory context h_t that guides both constructive and improvement-based NCO (Garmendia et al., 2024). The formalism enables efficient avoidance of local optima and redundant exploration. Parallel threads share the memory, enabling collaborative search. Empirical results for Maximum Cut, Maximum Independent Set, and TSP reveal both superior solution diversity and substantial inference speedup relative to memory-less or non-cooperative learning baselines.

b. Multi-Agent RL for Cross-Domain Recommendation

In cross-domain recommendation, MARCO leverages cooperative multi-agent RL, assigning one agent per source domain. Each agent estimates its domain’s contribution to the target (cold-start) domain, and all agents’ weights are coordinated via a decentralized partially observable Markov decision process (Dec-POMDP) (Xie et al., 6 Oct 2025). An entropy-based action diversity penalty discourages collapse onto a single domain strategy, improving both robustness and generalization for data-sparse regimes. Experiments on Amazon sub-category benchmarks with varying cold-start rates confirm 6–32% error reductions versus state-of-the-art single-agent approaches, with ablation showing criticality of both multi-agent decomposition and entropy regularization.

c. Hardware-Aware Neural Architecture Search (NAS)

MARCO introduces mixed-precision NAS for edge devices via a two-agent (hardware/quantization) RL system and a lightweight Conformal Prediction (CP) filter for statistically safe pruning (Fayyazi et al., 16 Jun 2025). The system applies PPO in a centralized-critic, decentralized-execution (CTDE) regime. CP surrogate models prune ~30% of candidate architectures before partial training, yielding 3–4× faster NAS with <0.3% accuracy loss and strict satisfaction of memory/latency constraints. Evaluation on the MAX78000 board confirms real-world latency closely matches simulation (≤5% deviation).

d. Model-Based MARCO for Cooperative RL

MARCO (Centralized Model and Exploration Policy for Multi-Agent RL) tackles sample inefficiency in Dec-POMDPs by learning a centralized model (transition, reward, observation) and using it as a planning and exploration substrate (Zhang et al., 2021). The core idea is to exploit uncertainty estimates across an ensemble of neural models to compose an intrinsic exploration reward. Theoretical analysis gives a PAC bound on sample complexity, polynomial in environment parameters. In cooperative navigation and communication benchmarks, MARCO achieves up to 20× improvement in environment sample efficiency over model-free RL.

4. MARCO in Language, Reasoning, and Detoxification

a. Mask-and-Replace with Experts/Anti-Experts for Detoxification

MaRCo (Mask and Replace with Context) presents a framework combining a denoising autoencoder with two expert LMs (non-toxic and toxic, respectively) in a Product-of-Experts (PoE) architecture (Hallinan et al., 2022). It identifies high-disagreement tokens for masking (via Jensen–Shannon divergence between expert and anti-expert token probabilities), and reconstructs masked spans by steering toward non-toxic completions and away from toxic ones. The approach flexibly controls detoxification aggressiveness (α₁/α₂ weights) and, unlike prior lexicon or classifier-based approaches, is fully unsupervised and context-adaptive. State-of-the-art performance is reported on subtle toxicity detection, surpassing both lexicon-driven and paraphrase-first baselines (~10.3% absolute toxicity reduction, 2.1× more preferred in human evaluations).

b. Meta-Reflection and Cross-Referencing in Code Reasoning

In code reasoning, MARCO (Meta-Reflection with Cross-Referencing) formalizes a framework in which multiple LLM agents individually solve sequences of code tasks, but dynamically share summaries (meta-reflections) of prior problems and incorporate lessons from peers’ successes and failures (cross-referencing) (Zhao et al., 23 May 2025). The protocol grows and periodically condenses a knowledge bank, feeding it into prompts for new tasks. Mathematically, each agent’s proposal at iteration t for problem i conditions on its own trajectory, the bank, and peer lesson sets. Across code induction, deduction, and abduction benchmarks, MARCO delivers substantial accuracy improvements (e.g., +20% on RobustFill induction) over chain-of-thought and iterative reflection methods, with remaining bottlenecks due to context length and summary compression.

5. MARCO for Dense and Multilingual IR Benchmarking

In multilingual IR, mMARCO datasets expand the MS MARCO passage ranking benchmark from English to 13 languages via automated translation, supporting direct comparison under consistent retrieval metrics (Bonifacio et al., 2021). Experiments establish the effectiveness of multilingual rerankers (mT5, mMiniLM) and dense retrieval (mColBERT). Adding translated language data consistently improves cross-lingual zero-shot transfer, with a moderate but robust correlation (R² ≈ 0.33) between MT BLEU scores and downstream retrieval MRR.

The synthetic Urdu MS MARCO extends this by fully translating all queries and passages, and empirically validates that sequence-to-sequence multilingual transformers can efficiently bootstrap IR for resource-limited scripts (Butt et al., 2024).

6. Impact, Limitations, and Broader Implications

MARCO frameworks collectively advance the state of the art in heterogeneous domains by combining modular decomposition, multi-agent RL, and context-enriched memory or knowledge integration. They provide reproducible, extensible baselines for low-resource IR (Butt et al., 2024), efficiency improvements in HPC code optimization (Rahman et al., 6 May 2025), data-efficient combinatorial search (Garmendia et al., 2024), rapid and hardware-constrained NAS (Fayyazi et al., 16 Jun 2025), and robust task orchestration (Shrimal et al., 2024).

Limitations center on scaling multi-agent coordination, context length, and the need for richer, domain-adapted pretrained models. Many MARCO systems highlight future directions such as improved lesson selection in reasoning frameworks, scalable credit assignment in cross-domain systems, richer memory representations for combinatorial solvers, and extensions to other languages, modalities, or more complex domains. Societally, these frameworks lower the barriers for inclusion of underrepresented languages and automated, robust reasoning, but must contend with translation artifacts, bias amplification, and validation against real-world constraints. The open-source releases associated with several MARCO systems function as blueprints for further multi-agent, modular, and knowledge-augmented advances across computational fields (Butt et al., 2024, Rahman et al., 6 May 2025, Garmendia et al., 2024, Bonifacio et al., 2021, Shrimal et al., 2024, Ho et al., 25 Feb 2025, Fayyazi et al., 16 Jun 2025).