Papers
Topics
Authors
Recent
Search
2000 character limit reached

FullStack-Dev: Multi-Agent Framework

Updated 9 February 2026
  • FullStack-Dev is a multi-agent, planning-centric framework that automates the generation of production-grade web applications across frontend, backend, and database layers.
  • It employs explicit architectural modularization and schema-driven planning to ensure deterministic, type-safe code synthesis and effective error localization.
  • Empirical evaluations demonstrate significant improvements in full-stack code accuracy and debugging efficiency compared to prior approaches.

FullStack-Dev denotes a multi-agent, planning-centric full-stack coding framework designed to automate, optimize, and coordinate the generation of production-grade web applications encompassing frontend, backend, and database components. Distinguished by explicit architectural modularization, strong agent cooperation mechanisms, and domain-consistent outputs, FullStack-Dev combines advanced code planning, fine-grained codebase navigation, systematic code editing, and development-oriented testing. It addresses the operational and conceptual limitations of earlier code agents, which often restricted automation capabilities to the frontend or produced superficial, non-production code lacking robust data flow, storage, and back-end execution guarantees (Lu et al., 3 Feb 2026).

1. Architectural Principles and Multi-Agent Roles

At the core of FullStack-Dev is a pragmatic, agentic workflow mirroring the organization of professional software teams. The system orchestrates three primary agents:

  • Planning Agent: Accepts a natural-language instruction and decomposes it into two explicit, JSON-encoded plans—a backend plan (PbeP_{be}: entities, API endpoints, business logic) and a frontend plan (PfeP_{fe}: page/component hierarchies, frontend data flows). Strict schema validation ensures that subsequent stages operate with full type information, avoiding ambiguity and guesswork during code generation and integration.
  • Backend Coding Agent: Consumes PbeP_{be} and incrementally synthesizes all backend endpoints, employing a suite of file system APIs (read_file, write_file, run_shell_command) and an embedded debugging tool. Each API is tested in situ; output and logs are validated against the specification, providing immediate feedback and convergence criteria.
  • Frontend Coding Agent: Builds the interactive UI and orchestrates data flows using PfeP_{fe} in conjunction with backend API definitions. A dedicated frontend debugging tool executes scripted user actions in a headless browser and, in case of errors, invokes a GUI analysis agent to identify the responsible code regions for remediation.

A key feature is agentic bug localization: error trajectories are tracked to concrete user actions (e.g., “Clicking Submit on Form X caused a 500 on /api/items”), with repairs prioritized by context-aware analysis (Lu et al., 3 Feb 2026).

2. Planning Module and Schema-Driven Specification

The Planning Agent constitutes a schema-centric, prompt-driven LLM component:

  • Formal mapping:

Plan:UP,P={(Pbe,Pfe)PbeJSONbe,  PfeJSONfe}\text{Plan}: \mathcal{U} \to \mathcal{P}, \quad \mathcal{P} = \{ (P_{be}, P_{fe}) \mid P_{be} \in \text{JSON}_{be},\; P_{fe} \in \text{JSON}_{fe} \}

where U\mathcal{U} is the space of user instructions.

  • Operation:
  1. User instruction uu is mapped to a standard planning prompt.
  2. The resultant LLM output is strictly parsed as JSON.
  3. Downstream agents enforce adherence to declared types (e.g., arrays, object structures), guaranteeing deterministic, type-safe code generation.

Schema-compliant planning ensures deterministic orchestration even as complexity scales, and it restricts the propagation of semantic errors across the agentic pipeline (Lu et al., 3 Feb 2026).

3. Code Editing, Navigation, and Development-Oriented Testing

FullStack-Dev’s code synthesis process is underpinned by a robust set of API primitives for direct manipulation of the project’s file system and runtime environment:

  • Primitives: Access and mutation routines (read_file, write_file, replace, list_directory, glob, search_file_content, run_shell_command).
  • Development-Oriented Testing:
    • Backend: backend_test(dir,cmd,ports,url,method,payload) executes live HTTP requests, intercepts response/console output, and flags deviations from the specification (non-200 codes, malformed payloads).
    • Frontend: frontend_test(dir,cmd,ports,instruction) orchestrates user-event scripts, collects screenshots, logs errors, and records both functionality and appearance scores in the [1,5] range.
    • Error Analytics: Any test failure triggers the GUI agent to associate the fault with the corresponding user action and code region, expediting targeted repair (Lu et al., 3 Feb 2026).

Empirical ablation studies demonstrate that removal of the specialized debugging tools or abandonment of the agent split increases the number of repair iterations by up to 40 (from $74.9$ to $115.5$ on average), underscoring their necessity for efficient, coordinated convergence.

4. Integration with LLM Training, Benchmarks, and Evaluation

FullStack-Dev is coupled with upstream model improvement and comprehensive evaluation pipelines:

  • Self-Improving LLMs (FullStack-Learn): Repository back-translation—transforming real-world Next.js/NestJS repositories into planning/action pairs—augments LLMs with strong in-domain priors. This process improves code synthesis on full-stack tasks by 9.7%9.7\% (frontend), 9.5%9.5\% (backend), and 2.8%2.8\% (database) for a $30$B backbone (Lu et al., 3 Feb 2026).
  • Comprehensive Benchmarking (FullStack-Bench): Solutions are automatically benchmarked on 647 frontend, 604 backend, and 389 database scenarios. The evaluation protocol adheres to the FullStack-Bench accuracy formula:

Accuracy=NYes+0.5NPartialNTotal×100%\text{Accuracy} = \frac{N_{\text{Yes}} + 0.5\, N_{\text{Partial}}}{N_{\text{Total}}} \times 100\%

Partial verdicts are supported in GUI-driven evaluations.

  • Performance Outcomes:
    • With Qwen3-Coder-480B-A35B-Instruct: 64.7%64.7\% FE, 77.8%77.8\% BE, 77.9%77.9\% DB accuracy; appearance score $3.72$.
    • Surpasses prior SOTA (WebGen-Agent) by +8.7%+8.7\% FE, +38.2%+38.2\% BE, +15.9%+15.9\% DB, confirming the gains from multi-agent planning and targeted debugging.

5. Workflow Robustness, Error Localization, and Practical Implications

Robustness stems from a combination of methodical navigation, deterministic agent plans, and fine-grained bug localization:

  • File Paths are constructed as absolute within a monorepo root, ensuring portability and isolation.
  • Agent Loops maintain context and repair state, which enables convergence in environments with complex or incomplete dependency graphs.
  • Error resolution: Attempted repair iterations are minimized. The performance study shows efficiency degrades by 50% without development-oriented testing mechanisms.

A plausible implication is that such structural partitioning and sophisticated context-tracking routines are prerequisites for scalable, low-touch automation in large enterprise settings spanning multiple application tiers (Lu et al., 3 Feb 2026).

6. Limitations and Future Directions

Although FullStack-Dev demonstrates competitive or superior accuracy and convergence properties, several open challenges remain:

  • Scalability: Multi-agent planning may require more advanced caching and memory strategies for large, monorepo-scale applications.
  • Code Drift: Excess TDD iterations introduce regression risks, motivating the integration of more advanced rollback and branching strategies.
  • Formatting Robustness: Strict adherence to XML/JSON output is more brittle on open-source LLMs compared to proprietary models, motivating future work towards “output-agency” and robust code+visual reasoning (Wan et al., 29 Sep 2025).
  • Context-Aware Expansion: As applications with dozens of interdependent pages become the norm, more intelligent context management and agent cooperation strategies are critical.

Integration of FullStack-Dev with self-improving LLM pipelines and comprehensive benchmarks establishes a reproducible foundation for further research in robust, agentic full-stack software synthesis.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FullStack-Dev.