FLEX: Forward Learning with Experience

Updated 16 November 2025

FLEX is a gradient-free agent learning paradigm that externalizes evolution by accumulating structured, textual experiences instead of modifying internal parameters.
It leverages a hierarchical repository to capture both successful and failed reasoning trajectories, enabling efficient retrieval and continuous performance gains across diverse domains.
Empirical validations show FLEX offers significant, sample-efficient improvements and facilitates experience inheritance across different LLM architectures.

Forward Learning with Experience (FLEX) is a gradient-free agent learning paradigm that enables LLM-based agents to achieve continuous, inheritable evolution by externalizing knowledge in a hierarchically structured, textual experience library. In contrast to traditional fine-tuning approaches, FLEX updates not the agent’s internal parameters, but a curated and expandable repository of experience—capturing both successful and failed reasoning trajectories. This experience-centric approach delivers scalable, sample-efficient, and portable improvements for frozen LLM agents in diverse domains such as mathematical reasoning, chemistry, and biology.

1. Motivation and Theoretical Foundation

Current LLM agents are deployed as essentially static entities: their internal weights are frozen post-training, precluding direct learning from real-time experience. This immutability presents three major challenges: the prohibitive computational cost of large-scale fine-tuning; the risk of catastrophic forgetting when updating for new tasks; and the inaccessibility of parameter-level updates in proprietary, closed-source models. FLEX reframes continual agent learning as the construction and dynamic refinement of an external experience library, accumulating and organizing interaction traces to inform future reasoning. The learning objective is as follows:

$\mathcal{J}(\mathcal{E}) = \mathbb{E}_{(X,Y)\sim \mathcal{D},\,\varepsilon\sim\rho(\cdot\mid X,\mathcal{E})} \big[ \Phi(\pi(\cdot\mid X,\varepsilon), Y) \big]$

$\mathcal{E}^* = \arg\max_\mathcal{E} \mathcal{J}(\mathcal{E})$

Here, $\pi$ is the frozen agent, $\mathcal{E}$ is the experience library, $\rho(\cdot|X,\mathcal{E})$ retrieves the most relevant experiences for query $X$ , and $\Phi$ measures solution accuracy. No gradient or weight update occurs; instead, the evolution of $\mathcal{E}$ is governed by a forward rule:

$\mathcal{E}_{i+1} \sim \mu\left(\cdot | \mathcal{E}_i, \{\tau_i | X_i, \pi \}\right)$

where $\mu$ is an updater that distills and merges new experiences into the library based on observed trajectories.

2. Experience Library: Structure and Retrieval

FLEX’s knowledge repository, $\mathcal{E}$ , is implemented as a hierarchical, textual database supporting both fine- and coarse-grained reasoning. The structure is as follows:

Hierarchy Level	Content Type	Example Use
1. Strategic	Principles, Guidelines	General approaches to tasks
2. Methodology	Templates, Reasoning Patterns	Stepwise problem-solving schemes
3. Instance	Factual Examples, Specific Cases	Concrete worked solutions or errors

The library partitions entries into a "golden zone" (successes) and "warning zone" (diagnosed failures). Each experience includes a descriptive trace, abstraction-level tags, and semantic links to related entries. At inference, natural-language queries against $\mathcal{E}$ retrieve contextually relevant knowledge in a coarse-to-fine retrieval cascade, not merely via vector similarity. This architecture enables flexible memory management, merge/deduplication of overlapping experiences, and support for diverse reasoning abstractions.

3. Reflection and Library Evolution Mechanisms

During interaction, FLEX employs an actor–critic architecture within a Markov Decision Process (MDP). The actor ( $\pi$ ) samples multiple trajectories per query via parallel rejection sampling and iterative self-refinement under critic feedback. The critic then:

Distills correct trajectories into concise, reusable golden experiences.
Diagnoses errors in incorrect trajectories, abstracting them as improvement suggestions ("warnings").

Both gold and warning meta-actions are synthesized into $E_i^T$ , the candidate updates for the experience library. The meta-updater $\mu$ then merges, inserts, or prunes entries, evolving $\mathcal{E}$ without gradient signals:

FLEX Algorithm (pseudocode overview):

Initialize experience library E ← ∅
for each epoch:
    for (X_i, Y_i) in data:
        Generate T trajectories τ_{i,t} using π and critic
        For each τ_{i,t}:
            Distill ε^+ (success) or ε^- (failure)
        Update E ← μ(E, union of all ε)
return E

This process ensures that FLEX learns exclusively through forward accumulation and curation of structured experience.

4. Information-Theoretic and Empirical Scaling Properties

FLEX’s information-theoretic underpinnings interpret library construction as a process that seeks to minimize the model’s posterior entropy with respect to ground-truth outcomes:

$\mathcal{E}^* \approx \arg\min_\mathcal{E}\; \mathbb{E}_{(X, Y),\, \varepsilon \sim \rho} \left[ \mathcal{H}(Y | X, \varepsilon) \right]$

Empirically, a scaling law emerges: as the size of $\mathcal{E}$ increases, both training and test accuracies exhibit predictable, power-law improvements (e.g., on GSM8K, accuracy grows from $81.2\%$ to $94.2\%$ as $|\mathcal{E}|$ increases from $1,000$ to $1,900$), while library growth itself is governed by a logistic curve over training epochs. These results establish that experience accumulation yields diminishing but predictable returns in performance.

5. Experience Inheritance and Portability

Because knowledge is encoded in the external library $\mathcal{E}$ rather than model parameters, FLEX enables direct transfer—termed "experience inheritance"—across agents with potentially different architectures. Empirical results demonstrate that importing a library constructed by a strong agent into a weaker one (or vice versa) provides substantial performance improvements:

Domain	Baseline Acc.	Flex-Inherited Acc.	Delta
AIME25 (Math)	50.0%	63.3%	+13.3 pp
USPTO50k	12.0%	18.0%	+6.0 pp
ProteinGym	51.5% (ρ)	56.6% (ρ)	+5.1

This supports the premise that $\mathcal{E}$ encodes model-agnostic, generalizable strategies and that cross-model "distillation" is feasible.

6. Empirical Validation and Component Analysis

Evaluation across mathematics (AIME25, GSM8K), chemical retrosynthesis (USPTO50k), and protein fitness prediction (ProteinGym) against vanilla LLM, in-context learning, and ReAct baselines demonstrates that FLEX confers large, sample-efficient performance improvements without any parameter tuning. Absolute gains of up to $+23$ percentage points (AIME25), $+10$ points (USPTO50k), and $+14$ in Spearman’s $\rho$ (ProteinGym) are reported. Component ablation studies confirm that each element (exploration, meta-level evolution, regression assistance) provides a measurable contribution to accuracy.

7. Limitations and Prospects

FLEX’s experience libraries can grow large, raising demands for efficient compression, summarization, and retrieval mechanisms to maintain real-time applicability. Certain domains (e.g., chemistry) may require representations beyond pure text, such as graphs or structured schemas. Open problems include automating the merging of domain-specific libraries into collective knowledge pools, developing advanced auditing and human-in-the-loop editing, and extending FLEX to even more interactive or lifelong learning environments. Continued work aims to elaborate these directions and further shift agent learning paradigms from parameter-centric to experience-centric modalities.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Forward Learning with Experience (FLEX).