FLEX: Forward Learning with Experience
- FLEX is a gradient-free agent learning paradigm that externalizes evolution by accumulating structured, textual experiences instead of modifying internal parameters.
- It leverages a hierarchical repository to capture both successful and failed reasoning trajectories, enabling efficient retrieval and continuous performance gains across diverse domains.
- Empirical validations show FLEX offers significant, sample-efficient improvements and facilitates experience inheritance across different LLM architectures.
Forward Learning with Experience (FLEX) is a gradient-free agent learning paradigm that enables LLM-based agents to achieve continuous, inheritable evolution by externalizing knowledge in a hierarchically structured, textual experience library. In contrast to traditional fine-tuning approaches, FLEX updates not the agent’s internal parameters, but a curated and expandable repository of experience—capturing both successful and failed reasoning trajectories. This experience-centric approach delivers scalable, sample-efficient, and portable improvements for frozen LLM agents in diverse domains such as mathematical reasoning, chemistry, and biology.
1. Motivation and Theoretical Foundation
Current LLM agents are deployed as essentially static entities: their internal weights are frozen post-training, precluding direct learning from real-time experience. This immutability presents three major challenges: the prohibitive computational cost of large-scale fine-tuning; the risk of catastrophic forgetting when updating for new tasks; and the inaccessibility of parameter-level updates in proprietary, closed-source models. FLEX reframes continual agent learning as the construction and dynamic refinement of an external experience library, accumulating and organizing interaction traces to inform future reasoning. The learning objective is as follows:
Here, is the frozen agent, is the experience library, retrieves the most relevant experiences for query , and measures solution accuracy. No gradient or weight update occurs; instead, the evolution of is governed by a forward rule:
where is an updater that distills and merges new experiences into the library based on observed trajectories.
2. Experience Library: Structure and Retrieval
FLEX’s knowledge repository, , is implemented as a hierarchical, textual database supporting both fine- and coarse-grained reasoning. The structure is as follows:
| Hierarchy Level | Content Type | Example Use |
|---|---|---|
| 1. Strategic | Principles, Guidelines | General approaches to tasks |
| 2. Methodology | Templates, Reasoning Patterns | Stepwise problem-solving schemes |
| 3. Instance | Factual Examples, Specific Cases | Concrete worked solutions or errors |
The library partitions entries into a "golden zone" (successes) and "warning zone" (diagnosed failures). Each experience includes a descriptive trace, abstraction-level tags, and semantic links to related entries. At inference, natural-language queries against retrieve contextually relevant knowledge in a coarse-to-fine retrieval cascade, not merely via vector similarity. This architecture enables flexible memory management, merge/deduplication of overlapping experiences, and support for diverse reasoning abstractions.
3. Reflection and Library Evolution Mechanisms
During interaction, FLEX employs an actor–critic architecture within a Markov Decision Process (MDP). The actor () samples multiple trajectories per query via parallel rejection sampling and iterative self-refinement under critic feedback. The critic then:
- Distills correct trajectories into concise, reusable golden experiences.
- Diagnoses errors in incorrect trajectories, abstracting them as improvement suggestions ("warnings").
Both gold and warning meta-actions are synthesized into , the candidate updates for the experience library. The meta-updater then merges, inserts, or prunes entries, evolving without gradient signals:
FLEX Algorithm (pseudocode overview):
1 2 3 4 5 6 7 8 |
Initialize experience library E ← ∅
for each epoch:
for (X_i, Y_i) in data:
Generate T trajectories τ_{i,t} using π and critic
For each τ_{i,t}:
Distill ε^+ (success) or ε^- (failure)
Update E ← μ(E, union of all ε)
return E |
This process ensures that FLEX learns exclusively through forward accumulation and curation of structured experience.
4. Information-Theoretic and Empirical Scaling Properties
FLEX’s information-theoretic underpinnings interpret library construction as a process that seeks to minimize the model’s posterior entropy with respect to ground-truth outcomes:
Empirically, a scaling law emerges: as the size of increases, both training and test accuracies exhibit predictable, power-law improvements (e.g., on GSM8K, accuracy grows from to as increases from $1,000$ to $1,900$), while library growth itself is governed by a logistic curve over training epochs. These results establish that experience accumulation yields diminishing but predictable returns in performance.
5. Experience Inheritance and Portability
Because knowledge is encoded in the external library rather than model parameters, FLEX enables direct transfer—termed "experience inheritance"—across agents with potentially different architectures. Empirical results demonstrate that importing a library constructed by a strong agent into a weaker one (or vice versa) provides substantial performance improvements:
| Domain | Baseline Acc. | Flex-Inherited Acc. | Delta |
|---|---|---|---|
| AIME25 (Math) | 50.0% | 63.3% | +13.3 pp |
| USPTO50k | 12.0% | 18.0% | +6.0 pp |
| ProteinGym | 51.5% (ρ) | 56.6% (ρ) | +5.1 |
This supports the premise that encodes model-agnostic, generalizable strategies and that cross-model "distillation" is feasible.
6. Empirical Validation and Component Analysis
Evaluation across mathematics (AIME25, GSM8K), chemical retrosynthesis (USPTO50k), and protein fitness prediction (ProteinGym) against vanilla LLM, in-context learning, and ReAct baselines demonstrates that FLEX confers large, sample-efficient performance improvements without any parameter tuning. Absolute gains of up to percentage points (AIME25), points (USPTO50k), and in Spearman’s (ProteinGym) are reported. Component ablation studies confirm that each element (exploration, meta-level evolution, regression assistance) provides a measurable contribution to accuracy.
7. Limitations and Prospects
FLEX’s experience libraries can grow large, raising demands for efficient compression, summarization, and retrieval mechanisms to maintain real-time applicability. Certain domains (e.g., chemistry) may require representations beyond pure text, such as graphs or structured schemas. Open problems include automating the merging of domain-specific libraries into collective knowledge pools, developing advanced auditing and human-in-the-loop editing, and extending FLEX to even more interactive or lifelong learning environments. Continued work aims to elaborate these directions and further shift agent learning paradigms from parameter-centric to experience-centric modalities.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free