LEGO Framework: Recursive Self-Training

Updated 14 November 2025

LEGO Framework is a recursive machine learning paradigm that combines self-supervision, recursive problem decomposition, and curriculum-driven training to enhance model performance.
It employs synthetic data generation and persistent external supervision to mitigate model collapse and ensure robust convergence.
Its versatile instantiations span generative modeling, combinatorial optimization, and language reasoning, leading to significant empirical gains in efficiency and accuracy.

The LEGO framework refers to a general paradigm for recursively constructing, training, and refining machine learning systems that leverage a combination of recursive structure, empirical self-training, and dynamic label or supervision generation. Variants of this framework have emerged independently across domains including combinatorial optimization, generative modeling, agentic reasoning, computer vision, and language modeling. Canonical implementations instantiate recursive training loops where a model not only solves a complex task via recursive subproblem decomposition but also generates its own supervision or curriculum, culminating in continuous self-improvement. The theoretical and practical properties of such frameworks have been rigorously analyzed in recent literature, including their asymptotic behaviors, model collapse tendencies, and curriculum-driven performance gains.

1. Formal Recursive Self-Training Structure

At the heart of the LEGO framework is a recursive training loop. Given a model with parameters $\theta^{(t)}$ at iteration $t$ , the process involves:

Generating synthetic data or subproblems from the current model or task instance.
Solving or annotating these new instances, possibly using the current model itself.
Aggregating the resulting data (and optionally, labels or preference weights) to perform a supervised or reinforcement learning update.
Repeating this cycle with updated model parameters.

A formal mathematical abstraction is seen in recursive self-training for generative models: let $S$ be a Polish sample space, $p_t \in P(S)$ the model at step $t$ . In pure recursion, at each step,

$p_{t+1}(A) = \frac{1}{N} \sum_{i=1}^N 1_{X_i \in A}, \quad X_i \sim p_t,$

i.e., the next model is the empirical distribution of $N$ samples from the current one (Borkar, 11 Jun 2025). For augmented recursion,

$p_{t+1} = a \mu_0 + (1-a) p_t,$

where $\mu_0$ is an external "true" data distribution and $a > 0$ is the mixing rate.

This recursive loop is mirrored in other domains, such as DP-guided self-training for combinatorial optimization (Brusca et al., 2023), recursive variant generation in language modeling (Simonds et al., 2 Mar 2025), or self-improvement cycles in agent frameworks (Yin et al., 6 Oct 2024).

2. Asymptotic Regimes and Theoretical Guarantees

The recursive nature of LEGO frameworks gives rise to distinct asymptotic regimes:

Pure Recursion: If synthetic generations are used exclusively, the sequence of learned distributions $\{\mu_n\}$ typically converges to a random Dirac measure $\delta_\Gamma$ , i.e., model collapse: all probability mass concentrates on a single point (Borkar, 11 Jun 2025). This results from martingale convergence in empirical measure space, and the collapse is inevitable absent any external anchor.
Augmented Regime (Persistent Excitation): Introducing even an infinitesimal fraction of fresh samples from a true data distribution $\mu_0$ breaks collapse: the empirical measure sequence forms an ergodic Feller chain whose invariant distribution has barycenter $\mu_0$ . The synthetic samples stabilize around $\mu_0$ , and collapse is replaced by mild degeneration, i.e., oscillation instead of singular convergence. Thus, persistent external "noise" or supervision is essential to maintain statistical diversity and anchoring.

Analogous phenomena manifest in self-improving diffusion models, where recursive self-training on synthetic generations can cause distributional drift and collapse unless controlled by careful prompt filtering, preference sampling, or distributional weighting (Zhang et al., 14 Feb 2025). In zero-shot semantic segmentation, a similar "loop closure" via pseudo-feature injection mitigates seen-class bias and prevents degenerate feature collapse (Wang et al., 2021).

3. Algorithmic Templates: Instantiations Across Domains

Several concrete instantiations of the LEGO paradigm appear in the literature, each tailored to the structural properties of its domain:

Domain	Recursion Type	Self-Training Mechanism
Generative Modeling	Empirical measure update	Synthetic sample mixing
Combinatorial Optimization (MIS)	DP-style subgraph choice	GNN-based comparator self-annotation
LLM Reasoning/Agents	Recursive code mutation	Self-evaluated utility improvement
Diffusion Models	Round-wise RSI	Preference sampling, prompt filtering
Zero-Shot Segmentation	Feature generator–classifier	High-confidence pseudo-feature feedback

Combinatorial Optimization (Maximum Independent Set): The framework recursively splits a problem into subproblems (e.g., removing a vertex or its neighbors), with a GNN-based comparator learned via self-generated rollouts. The same self-training loop—buffering subproblem comparisons, labeling via repeated recursions, and updating the GNN to better align with inferred outcomes—yields substantial improvement over "flat" or human-labeled pipelines (Brusca et al., 2023).
Agent Self-Improvement: Frameworks such as the Gödel Agent use recursive, LLM-powered code inspection and rewriting, guided solely by a high-level utility function. The agent proposes code edits, evaluates performance, and recurses until utility stalls, safeguarded by error-handling rollbacks and self-consistency checks (Yin et al., 6 Oct 2024).
Recursive Curriculum Learning (LADDER): Given a hard instance (e.g., an integral), recursively generate easier variants until the model can be reliably trained on the synthetic curriculum. RL is then performed across this generated instance tree, ensuring a smooth reward gradient and preventing policy collapse (Simonds et al., 2 Mar 2025).

4. Model Collapse, Degeneration, and Mitigation

A central finding across LEGO frameworks is the universal risk of model collapse under recursive self-training in the absence of persistent external supervision (Borkar, 11 Jun 2025). This phenomenon arises because the stochastic recursion, absent external input, inevitably concentrates variance, driving empirical distributions toward extremal (Dirac) points in probability space.

Mitigation strategies include:

Persistent Data Injection: Systematically mix even a small amount of fresh, real data at every recursive step.
Strategic Sampling and Filtering: In generative image models, apply prompt filtering (for clarity/diversity), preference sampling (to select perceptually aligned or human-preferred samples), and distributional weighting (to penalize out-of-distribution or hallucinatory samples) (Zhang et al., 14 Feb 2025).
Confidence-Weighted Pseudo-Labels: In segmentation and classification tasks, only inject high-confidence pseudo-examples, and down-weight their impact to avoid error accumulation (Wang et al., 2021).
Curriculum Design: Structure recursive tasks or variant generation to ensure a gradual, well-behaved difficulty gradient, so that each step in the curriculum is solvable and reliably verifiable (Simonds et al., 2 Mar 2025).

Empirical evidence consistently demonstrates that such mitigations are necessary to preserve generalization, avoid collapse, and yield non-trivial solution quality in recursive self-improving systems.

5. Experimental Evidence and Performance Metrics

Rigorous ablation studies and empirical benchmarks substantiate the practical value of LEGO frameworks:

Maximum Independent Set: The DP-self-training GNN approach outperforms prior methods on synthetic and real datasets, with approximation ratios and consistency climbing steadily across training cycles (Brusca et al., 2023).
LADDER and TTRL: On symbolic integration, Llama 3B's pass@1 accuracy is raised from 1% to 82% without human labeling, and Qwen 2.5 7B reaches 90% on the MIT Integration Bee via recursive curriculum RL (Simonds et al., 2 Mar 2025).
Gödel Agent: Recursive self-editing agents surpass fixed-pipeline or meta-learned agents in both accuracy and cost-efficiency across QA and math benchmarks, with notable gains in MGSM (+11% over SOTA in constrained mode) (Yin et al., 6 Oct 2024).
Self-evolving Diffusion Models: Recursive self-improvement with filtering and weighting yields +7 percentage points on HPS v2.1 and outperforms naive supervised fine-tuning after four rounds, using significantly fewer synthetic examples (Zhang et al., 14 Feb 2025).
Zero-Shot Segmentation: Recursive pseudo-feature injection via ZS-MMD achieves state-of-the-art harmonic IoU metrics across Pascal VOC/Context (e.g., 37.5% unseen-mIoU vs. 35.4% for prior art at K=2) (Wang et al., 2021).

Framework/Domain	Main Quantitative Effect
DP-Self-Training for MIS	Improved MIS-approximation; rising consistency
LADDER on Symbolic Integration	1%→82% Llama3 3B accuracy; SOTA on MIT Bee
Gödel Agent	+11% over SOTA on MGSM at lower cost
RSIDiffusion	+7% (HPS v2.1), +181.6 (ImageReward)
Recursive ZS Semantic Segmentation	SOTA hIoU for unseen classes (Δ up to +6%)

6. Generalization, Applicability, and Practical Guidelines

LEGO-style recursive frameworks are not confined to any single domain or architecture. Their applicability is marked by:

Existence of a self-reducible problem structure (e.g., DP recurrence, curriculum tree, agent codebase).
Capacity to generate, verify, or annotate new subproblems or samples from the current model/policy.
Verifiable improvement metric (analytic, via RL, or through preference-based filtering).
Facility for incremental, on-policy retraining, ensuring positive feedback between better policies and more informative supervision.

Practical deployment mandates continuous external anchoring—be it in the form of true data, human-in-the-loop interventions, or robust heuristics for outliers and hallucinations. Recursion depth, rollout count, and filtering thresholds represent crucial trade-off hyperparameters, balancing computational burden against convergence and solution quality.

7. Significance and Outlook

The LEGO framework crystallizes a core methodological advance in self-improving AI systems: the coupling of recursive, structure-exploiting decision-making with self-bootstrapped, dynamically generated supervision. This removes reliance on fixed datasets or human annotation and enables rapid, autonomous progression in complex domains, including those with no affordable human-curated curricula.

The rigorous theoretical characterization of collapse and persistence, coupled with substantial empirical advances in performance across optimization, generation, reasoning, and perception, positions the LEGO paradigm as a central template for future self-improving machine learning systems. Ongoing directions include generalizing the pattern to multi-agent competition, richer DP or recursive decompositions, and stronger guarantees on long-run diversity and generalization under minimal external anchoring.