FullStack-Learn: Scaling Agentic Web Development

Updated 9 February 2026

FullStack-Learn is a self-improving methodology that enhances LLMs for end-to-end web development using real-world code repositories.
It utilizes a unique repository back-translation process to convert production code into structured training trajectories for multi-agent planning and fine-tuning.
The approach iteratively augments data and employs rigorous evaluation across frontend, backend, and database tasks to ensure robust production-level applications.

FullStack-Learn is a data-scaling and self-improving methodology designed to enhance agentic LLMs for full-stack web development tasks. It is a core component of the FullStack-Agent system, which aims to produce production-level, end-to-end web applications by tightly coupling multi-agent planning, realistic coding environments, and rigorous testing standards. FullStack-Learn systematically leverages real-world web application repositories via a process called repository back-translation, generating structured trajectories for supervised fine-tuning and synthetic augmentation to iteratively improve agentic coding performance across frontend, backend, and database domains (Lu et al., 3 Feb 2026).

1. Motivation and Challenges in Agentic Full-Stack Learning

FullStack-Learn addresses the following obstacles intrinsic to LLM-driven full-stack coding agents:

Complexity of Real Codebases: Large web stacks (e.g., Next.js, NestJS) include hundreds of files, deep directory structures, and complex interdependencies, requiring agents to master code navigation, multi-file editing, and integration of package updates.
End-to-End Data Consistency: True full-stack applications demand accurate propagation and type-consistent data exchange between UI, backend, and persistent database layers—transcending the limitations of frontend-only, mock-API approaches.
Long-Horizon Decomposition: Satisfying complex user instructions (e.g., building a tracker with authentication, reporting, and multi-entity relations) involves extended multi-step planning and tool invocation, often exceeding current context windows for LLM agents.
Bug Localization and Verification: Subtle errors can manifest in any stack tier, necessitating targeted debugging support that spans source code, build pipelines, and runtime logs.

Traditional LLM code generation approaches, which focus solely on frontend demos or static code synthesis, cannot address these requirements or be robustly validated in the absence of genuine backend and database logic (Lu et al., 3 Feb 2026).

2. Repository Back-Translation Pipeline

FullStack-Learn's central mechanism is repository back-translation, which transforms high-quality, real-world web project repositories ( $\mathcal{R}_{\mathrm{real}}$ ) into learning trajectories suitable for large-scale supervised fine-tuning (SFT). The pipeline includes:

Information Gathering Agent: Traverses the target repo, performing glob and directory listings to extract core files. It distills each repo into a tuple containing title, description, backendPlan, frontendPlan, paraphrased user instruction, and a quality score $q\in[0,5]$ .
Trajectory Back-Translation Agent: Initializes a base Next.js + NestJS skeleton and “replays” the construction of the repo as a sequence of tool-interleaved steps ( $\tau$ ), such as file edits, shell commands, and test invocations.
Rule-Based Cleaning: Applies deterministic rewriting to canonicalize file paths, expunge direct mentions of the upstream repository, and re-executes tool calls to generate up-to-date output artifacts.
Debug-Based Filtering: Applies debugger modules to evaluate functional and aesthetic correctness and to filter out flawed or unverifiable examples (Lu et al., 3 Feb 2026).

Each processed repo thereby generates a set of agent interaction trajectories (prompt/action pairs) capturing realistic, multi-stage development dynamics critical for robust SFT.

3. Data Augmentation and Iterative Self-Improvement

To further amplify the effective dataset size and model generalization:

Repository Augmentation: An augmentation planning agent proposes five modifications per repo (one simplification, one extension, three parallel applications). An augmentation implementing agent then executes and validates these variants, typically yielding a 5x increase in synthetic repository instances.
Iterative Self-Improvement Objective: The backbone LLM $M$ is fine-tuned to minimize

$\mathcal{L}_{\mathrm{SFT}}(\theta;D) = -\sum_{(x,y)\in D}\log p_\theta(y\mid x)$

where $D$ is the set of extracted (prompt, action) interactions. The process follows two rounds, alternating real and augmented repo back-translation, leading to $D_0$ (real-only) and $D_1 = D_0 \cup D_{aug}$ . Fine-tuning halts upon validation accuracy saturation on the FullStack-Bench suite (Lu et al., 3 Feb 2026).

4. Evaluation in the FullStack-Bench Testbed

Performance gains attributable to FullStack-Learn are benchmarked using FullStack-Bench, which analyzes frontend, backend, and database tasks separately. The testbed features:

Test Tier	Coverage	Verification Agent
Frontend	647 GUI tasks	Qwen3-VL-235B-A22B GUI-agent
Backend	604 API endpoints	Qwen3-Coder-480B-A35B
Database	389 schema/data checks	JSON query agent

Formal metrics are:

$\mathrm{Acc}_{FE} = \frac{N_{Yes} + 0.5N_{Partial}}{N_{Total}}\times100\%,\quad \mathrm{Acc}_{BE} = \frac{N_{Yes}^{BE}}{N_{Total}^{BE}}\times100\%,\quad \mathrm{Acc}_{DB} = \frac{N_{Yes}^{DB}}{N_{Total}^{DB}}\times100\%$

Empirical results for a 30B model demonstrate that FullStack-Learn delivers statistically significant improvements after two learning rounds (relative to baseline):

$\begin{aligned} \text{Baseline (30B)}\,\;\mathrm{Acc}_{FE}=37.2\%,\; \mathrm{Acc}_{BE}=38.7\%,\; \mathrm{Acc}_{DB}=50.9\% \ \text{After Round1}\,\quad\;\mathrm{Acc}_{FE}=42.3\,(+5.1),\; \mathrm{Acc}_{BE}=45.4\,(+6.7),\; \mathrm{Acc}_{DB}=51.2\,(+0.3) \ \text{After Round2}\,\quad\;\mathrm{Acc}_{FE}=46.9\,(+9.7),\; \mathrm{Acc}_{BE}=48.2\,(+9.5),\; \mathrm{Acc}_{DB}=53.7\,(+2.8) \end{aligned}$

All improvements are significant with $p<0.01$ (McNemar’s test) (Lu et al., 3 Feb 2026).

5. Significance, Strengths, and Limitations

FullStack-Learn introduces several advances:

Learning Real Web Development Dynamics: By extracting and replaying genuine workflows from production repositories, the methodology enables agents to grasp idioms and development patterns unattainable from synthetic or frontend-only tasks.
Effective Scaling via Augmentation: Systematic modification and validation of base repositories yield an order of magnitude more training data, supporting generalization and robustness to new instructions.
Modular Integration in Agent Frameworks: The output data format and back-translation loop are stack-agnostic, albeit current experiments leverage Next.js/NestJS exclusively, implying an open avenue for extension.

Principal limitations include template dependence, significant computation and inference latency due to tool-driven episode length, and incomplete verification for deep business rules and security policies (Lu et al., 3 Feb 2026).

6. Future Directions

Enhancements under consideration encompass:

Automated stack detection and onboarding for other web frameworks (e.g., Django + React, Flask + Vue) through prompt induction.
Incorporation of backend and database tests as dense signals for reinforcement learning or reward model training.
Integration of vulnerability scanning and security linters into the debug-and-backtranslate-test loop.
Continuous learning by periodically mining and replaying steps from agent-generated repositories deployed in real-world settings (Lu et al., 3 Feb 2026).

A plausible implication is that systematic repository back-translation, combined with agentic planning architectures and benchmark-driven self-improvement, constitutes a scalable methodology for closing the gap between code synthesis demos and production-ready full-stack applications.

Markdown Report Issue Upgrade to Chat

References (1)

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FullStack-Learn.

FullStack-Learn: Scaling Agentic Web Development

1. Motivation and Challenges in Agentic Full-Stack Learning

2. Repository Back-Translation Pipeline

3. Data Augmentation and Iterative Self-Improvement

4. Evaluation in the FullStack-Bench Testbed

5. Significance, Strengths, and Limitations

6. Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FullStack-Learn: Scaling Agentic Web Development

1. Motivation and Challenges in Agentic Full-Stack Learning

2. Repository Back-Translation Pipeline

3. Data Augmentation and Iterative Self-Improvement

4. Evaluation in the FullStack-Bench Testbed

5. Significance, Strengths, and Limitations

6. Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research