FullStack-Dev: Multi-Agent Framework

Updated 9 February 2026

FullStack-Dev is a multi-agent, planning-centric framework that automates the generation of production-grade web applications across frontend, backend, and database layers.
It employs explicit architectural modularization and schema-driven planning to ensure deterministic, type-safe code synthesis and effective error localization.
Empirical evaluations demonstrate significant improvements in full-stack code accuracy and debugging efficiency compared to prior approaches.

FullStack-Dev denotes a multi-agent, planning-centric full-stack coding framework designed to automate, optimize, and coordinate the generation of production-grade web applications encompassing frontend, backend, and database components. Distinguished by explicit architectural modularization, strong agent cooperation mechanisms, and domain-consistent outputs, FullStack-Dev combines advanced code planning, fine-grained codebase navigation, systematic code editing, and development-oriented testing. It addresses the operational and conceptual limitations of earlier code agents, which often restricted automation capabilities to the frontend or produced superficial, non-production code lacking robust data flow, storage, and back-end execution guarantees (Lu et al., 3 Feb 2026).

1. Architectural Principles and Multi-Agent Roles

At the core of FullStack-Dev is a pragmatic, agentic workflow mirroring the organization of professional software teams. The system orchestrates three primary agents:

Planning Agent: Accepts a natural-language instruction and decomposes it into two explicit, JSON-encoded plans—a backend plan ( $P_{be}$ : entities, API endpoints, business logic) and a frontend plan ( $P_{fe}$ : page/component hierarchies, frontend data flows). Strict schema validation ensures that subsequent stages operate with full type information, avoiding ambiguity and guesswork during code generation and integration.
Backend Coding Agent: Consumes $P_{be}$ and incrementally synthesizes all backend endpoints, employing a suite of file system APIs (read_file, write_file, run_shell_command) and an embedded debugging tool. Each API is tested in situ; output and logs are validated against the specification, providing immediate feedback and convergence criteria.
Frontend Coding Agent: Builds the interactive UI and orchestrates data flows using $P_{fe}$ in conjunction with backend API definitions. A dedicated frontend debugging tool executes scripted user actions in a headless browser and, in case of errors, invokes a GUI analysis agent to identify the responsible code regions for remediation.

A key feature is agentic bug localization: error trajectories are tracked to concrete user actions (e.g., “Clicking Submit on Form X caused a 500 on /api/items”), with repairs prioritized by context-aware analysis (Lu et al., 3 Feb 2026).

2. Planning Module and Schema-Driven Specification

The Planning Agent constitutes a schema-centric, prompt-driven LLM component:

Formal mapping:

$\text{Plan}: \mathcal{U} \to \mathcal{P}, \quad \mathcal{P} = \{ (P_{be}, P_{fe}) \mid P_{be} \in \text{JSON}_{be},\; P_{fe} \in \text{JSON}_{fe} \}$

where $\mathcal{U}$ is the space of user instructions.

Operation:

User instruction $u$ is mapped to a standard planning prompt.
The resultant LLM output is strictly parsed as JSON.
Downstream agents enforce adherence to declared types (e.g., arrays, object structures), guaranteeing deterministic, type-safe code generation.

Schema-compliant planning ensures deterministic orchestration even as complexity scales, and it restricts the propagation of semantic errors across the agentic pipeline (Lu et al., 3 Feb 2026).

FullStack-Dev’s code synthesis process is underpinned by a robust set of API primitives for direct manipulation of the project’s file system and runtime environment:

Primitives: Access and mutation routines (read_file, write_file, replace, list_directory, glob, search_file_content, run_shell_command).
Development-Oriented Testing:
- Backend: backend_test(dir,cmd,ports,url,method,payload) executes live HTTP requests, intercepts response/console output, and flags deviations from the specification (non-200 codes, malformed payloads).
- Frontend: frontend_test(dir,cmd,ports,instruction) orchestrates user-event scripts, collects screenshots, logs errors, and records both functionality and appearance scores in the [1,5] range.
- Error Analytics: Any test failure triggers the GUI agent to associate the fault with the corresponding user action and code region, expediting targeted repair (Lu et al., 3 Feb 2026).

Empirical ablation studies demonstrate that removal of the specialized debugging tools or abandonment of the agent split increases the number of repair iterations by up to 40 (from $74.9$ to $115.5$ on average), underscoring their necessity for efficient, coordinated convergence.

4. Integration with LLM Training, Benchmarks, and Evaluation

FullStack-Dev is coupled with upstream model improvement and comprehensive evaluation pipelines:

Self-Improving LLMs (FullStack-Learn): Repository back-translation—transforming real-world Next.js/NestJS repositories into planning/action pairs—augments LLMs with strong in-domain priors. This process improves code synthesis on full-stack tasks by $9.7\%$ (frontend), $P_{fe}$ 0 (backend), and $P_{fe}$ 1 (database) for a $P_{fe}$ 2B backbone (Lu et al., 3 Feb 2026).
Comprehensive Benchmarking (FullStack-Bench): Solutions are automatically benchmarked on 647 frontend, 604 backend, and 389 database scenarios. The evaluation protocol adheres to the FullStack-Bench accuracy formula:

$P_{fe}$ 3

Partial verdicts are supported in GUI-driven evaluations.

Performance Outcomes:
- With Qwen3-Coder-480B-A35B-Instruct: $P_{fe}$ 4 FE, $P_{fe}$ 5 BE, $P_{fe}$ 6 DB accuracy; appearance score $P_{fe}$ 7.
- Surpasses prior SOTA (WebGen-Agent) by $P_{fe}$ 8 FE, $P_{fe}$ 9 BE, $P_{be}$ 0 DB, confirming the gains from multi-agent planning and targeted debugging.

5. Workflow Robustness, Error Localization, and Practical Implications

Robustness stems from a combination of methodical navigation, deterministic agent plans, and fine-grained bug localization:

File Paths are constructed as absolute within a monorepo root, ensuring portability and isolation.
Agent Loops maintain context and repair state, which enables convergence in environments with complex or incomplete dependency graphs.
Error resolution: Attempted repair iterations are minimized. The performance study shows efficiency degrades by 50% without development-oriented testing mechanisms.

A plausible implication is that such structural partitioning and sophisticated context-tracking routines are prerequisites for scalable, low-touch automation in large enterprise settings spanning multiple application tiers (Lu et al., 3 Feb 2026).

6. Limitations and Future Directions

Although FullStack-Dev demonstrates competitive or superior accuracy and convergence properties, several open challenges remain:

Scalability: Multi-agent planning may require more advanced caching and memory strategies for large, monorepo-scale applications.
Code Drift: Excess TDD iterations introduce regression risks, motivating the integration of more advanced rollback and branching strategies.
Formatting Robustness: Strict adherence to XML/JSON output is more brittle on open-source LLMs compared to proprietary models, motivating future work towards “output-agency” and robust code+visual reasoning (Wan et al., 29 Sep 2025).
Context-Aware Expansion: As applications with dozens of interdependent pages become the norm, more intelligent context management and agent cooperation strategies are critical.

Integration of FullStack-Dev with self-improving LLM pipelines and comprehensive benchmarks establishes a reproducible foundation for further research in robust, agentic full-stack software synthesis.

Markdown Report Issue Upgrade to Chat

References (2)

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation (2026)

Automatically Generating Web Applications from Requirements Via Multi-Agent Test-Driven Development (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FullStack-Dev.

FullStack-Dev: Multi-Agent Framework

1. Architectural Principles and Multi-Agent Roles

2. Planning Module and Schema-Driven Specification

3. Code Editing, Navigation, and Development-Oriented Testing

4. Integration with LLM Training, Benchmarks, and Evaluation

5. Workflow Robustness, Error Localization, and Practical Implications

6. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

FullStack-Dev: Multi-Agent Framework

1. Architectural Principles and Multi-Agent Roles

2. Planning Module and Schema-Driven Specification

3. Code Editing, Navigation, and Development-Oriented Testing

4. Integration with LLM Training, Benchmarks, and Evaluation

5. Workflow Robustness, Error Localization, and Practical Implications

6. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics