MetaGPT: Multi-Agent Meta-Programming Framework

Updated 7 April 2026

MetaGPT is a meta-programming framework that orchestrates specialized LLM-driven agents using role-specific SOPs to enhance error reduction and workflow efficiency.
It employs model-exclusive task arithmetic to merge fine-tuned models, delivering improved multi-task performance and robust out-of-distribution generalization without extra data.
The framework extends its application to LLM safety via structured agent pipelines and to photonic metasurface design by generating interpretable, fabricable inverse design models.

MetaGPT (Meta-GPT) denotes a family of frameworks leveraging meta-programming, agent-based orchestration, and model-arithmetic techniques to address complex collaborative, multi-agent, and inverse-design problems across AI, scientific computing, and language modeling domains. Distinct current instantiations of MetaGPT have manifested in collaborative software engineering via multi-agent systems (Hong et al., 2023), model merging for LLMs using model-exclusive task arithmetic (Zhou et al., 2024), agent-based defense against jailbreaks in LLMs (Wong et al., 24 Nov 2025), and foundation models for photonic metasurface design with an interpretable symbolic language (Dang et al., 15 Dec 2025).

1. Multi-Agent Meta-Programming and SOPs

The canonical MetaGPT framework is a meta-programming system for orchestrating teams of LLM-driven agents, structured as modular assemblies to emulate human software development workflows (Hong et al., 2023). Each agent is assigned a strictly defined domain role (e.g., Product Manager, Architect, Engineer, QA Engineer), with role-dependent Standardized Operating Procedures (SOPs) guiding their prompt templates, dependencies, and outputs.

MetaGPT’s message-passing architecture is underpinned by a global message pool (publish/subscribe model) and a meta-controller handling workflow scheduling, dependency resolution, and schema enforcement. By mapping human-like SOPs into agent prompt templates, the framework enforces structured outputs (XML-like sections per role) and strict handoffs, minimizing free-form hallucinations characteristic of chained LLM calls and establishing strong cross-role verification and feedback loops.

A typical pipeline for software engineering includes sequential roles (Product Manager → Architect → Project Manager → Engineer → QA Engineer), where outputs are parsed and checked at each handoff for compliance with the relevant schema, leveraging executable feedback loops to integrate runtime validation and self-debugging. The effective error rate across the system is reduced substantially by downstream schema-checks, with overall error rate modeled as:

$P_{\text{total error}} \approx 1 - \prod_i (1 - \epsilon_i \cdot (1 - \delta_i))$

where $\epsilon_i$ is the unconstrained error rate of agent $i$ , and $\delta_i$ is the downstream schema-catch rate. In practical evaluations, these mechanisms approximately halve system-wide hallucination rates and revisions (Hong et al., 2023).

2. Agent Collaboration, Role Assignment, and Error Checking

MetaGPT’s collaborative paradigm draws inspiration from the assembly line, advocating explicit role assignment, dependency management, and inter-agent verification. Each agent processes strictly structured prompts (including role description, dependencies, and expected output fields) and parses peer outputs for correct structure before progressing.

Core features include:

Assembly-line decomposition: Sequential specialist agents reflect typical human engineering workflows.
Structured communication: Enforced output schemas eliminate idle chit-chat and reduce off-topic drift.
Executability feedback: Code artifacts are unit-tested by Engineer agents, with failures triggering self-corrective repair cycles (success probability improves exponentially with debug iterations).
Cross-role verification: Design compliance is reviewed by Architect and QA Engineer agents.

Empirical quantification demonstrates measurable gains: on HumanEval and MBPP code synthesis benchmarks, MetaGPT-equipped pipelines exceed single-LLM baselines in Pass@1 and executability (e.g., MetaGPT+GPT-4 yields 85.9% HumanEval Pass@1 vs. vanilla GPT-4 at 80.5%) and significantly reduce human revision effort (Hong et al., 2023).

3. Model-Exclusive Task Arithmetic for Merging LLMs

A separate instantiation of MetaGPT formalizes model merging as a multi-task optimization problem, seeking to combine several single-task fine-tuned LLMs into a single merged model with optimal average per-task performance (Zhou et al., 2024).

Given a pre-trained model $\theta_0$ and $T$ task-specialized models $\theta_1,\dots,\theta_T$ , the task vectors $\tau_t = \theta_t - \theta_0$ encode task-specific updates. The merging objective is to minimize the average loss difference (ALD) between the merged model $\theta_{\text{final}} = \theta_0 + \sum_i \lambda_i \tau_i$ and each $\theta_t$ , with closed-form optimal scaling coefficients:

$\epsilon_i$ 0

Crucially, the method relies only on model weights, is fully data-agnostic, and does not require expensive grid search or access to private fine-tuning datasets. The upper bounding of loss differences uses assumptions of task vector orthogonality and local linearity (NTK regime) for wide LLMs. MetaGPT task arithmetic demonstrates improved multi-task and out-of-distribution generalization compared to previous model-merging baselines, with no hyperparameters and stability across scale (Zhou et al., 2024).

4. Structured Agent-Based Defense Against Jailbreak Attacks

A further development integrates MetaGPT into LLM safety by implementing defense pipelines consisting of specialized agents for robust jailbreak mitigation (Wong et al., 24 Nov 2025). The pipeline architecture comprises:

Rephrase Agent (RA): Sanitizes user queries;
Core LLM Agent (CLLM): Generates answers strictly conditioned on sanitized queries;
Judge Agent (JA): Evaluates responses for policy compliance (physical harm, illicit advice, privacy violations).

Communication is strictly via named topics/messages, and each agent operates only within its predefined subtask. After content is generated, the JA either passes the output or triggers stricter reformulation via the RA. In the event of persistent violations, the pipeline gates and refuses the request.

Experiments report a complete reduction of attack success rate (ASR = 0.00) on both aligned and unaligned LLMs, representing empirical “full mitigation” across all tested unsafe prompts. This architecture achieves theoretical safety at the cost of tripled inference latency (three serial agent queries) and linearly increasing template engineering per domain (Wong et al., 24 Nov 2025).

5. Domain-Specific Foundation Models: Meta-GPT for Photonic Metasurfaces

Meta-GPT also applies as a domain-specific foundation model for physics-driven inverse design, particularly for photonic metasurface discovery (Dang et al., 15 Dec 2025). Here, Meta-GPT is a decoder-only transformer (12 layers, 12 heads/layer, 694-token vocabulary, context window 140) trained on METASTRINGS—a symbolic, formally-specified language encoding layered material stacks, lattice geometries, and nanostructure layouts as unified text sequences.

Training proceeds in three phases:

Supervised pretraining on $\epsilon_i$ 1 randomly sampled METASTRINGS, establishing broad coverage of the design space.
Supervised fine-tuning on $\epsilon_i$ 2 METASTRINGS plus electromagnetic simulation-derived absorption spectra to enable inverse design.
Optionally, reinforcement learning (REINFORCE) using FDTD-simulated rewards and chain-of-thought (CoT) augmentation for intermediate reasoning.

Performance benchmarks show sub-3% mean-squared spectral error, >98% syntactic validity, 100% sequence uniqueness, and >99% design novelty relative to the training corpus. Fabrication and measurement further confirm the physical realizability and predictive power of Meta-GPT-designed metasurfaces, with measured spectra matching within a few percent of simulation and target spectral curves (Dang et al., 15 Dec 2025).

6. Limitations, Scalability, and Future Directions

MetaGPT frameworks across domains share limitations:

Hand-crafted SOPs and templates require significant domain expertise and manual engineering.
Pipeline latency and token usage scale linearly with agent count and complexity.
Synchronous and linear assembly lines may not generalize to asynchronous or highly dynamic workflows.

Future work outlined includes automated SOP and template learning from logs, dynamic agent creation, multimodal context embedding (e.g., field distributions, fabrication metadata), cross-domain extensions (mechanical, thermal, quantum metasurfaces), and closed-loop autonomous materials discovery pipelines (Hong et al., 2023, Dang et al., 15 Dec 2025).

7. Comparative Summary of MetaGPT Instantiations

Domain/Use Case	Core Mechanism	Key Technical Result
Collaborative software engineering	SOP-based agent roles & message pool	Pass@1 up to 87.7%, 60%+ reduction in revisions (Hong et al., 2023)
Model merging for LLMs	Task vector arithmetic (closed-form λ)	SOTA multi-task accuracy, no data access needed (Zhou et al., 2024)
LLM safety/jailbreak defense	Agent pipeline (RA, CLLM, JA)	100% ASR mitigation (empirical) (Wong et al., 24 Nov 2025)
Photonic inverse design	Symbolic language + transformer + RL/CoT	<3% MSE, >98% validity, real device fabrication (Dang et al., 15 Dec 2025)

The scope of the MetaGPT paradigm encompasses agent-based workflow orchestration, modular model merging, interpretable symbolic representations, closed-form optimization, and pipeline-compositional safety, advancing both collaborative AI and domain-specific scientific discovery.