Ecologically Rational Meta-Learned Inference

Updated 4 September 2025

ERMI is a framework that integrates ecological rationality with meta-learning to develop inference strategies from ecologically valid, in-context tasks.
It leverages large language model-generated tasks and meta-training on diverse real-world distributions to enable flexible, adaptive learning without hand-tuned priors.
Empirical results demonstrate ERMI’s state-of-the-art performance and human-like generalization across function learning, categorization, and decision-making benchmarks.

Ecologically Rational Meta-learned Inference (ERMI) unifies the principles of ecological rationality—adapting to the statistical structure of real environments—with the meta-learning paradigm, which leverages training on large task distributions to acquire general-purpose inference strategies. ERMI algorithms internalize environmental regularities by using meta-learning on ecologically valid tasks, often generated via LLMs, thereby producing in-context learners that adapt flexibly to new situations without explicit heuristics or hand-tailored prior design. This approach advances both cognitive modeling—explaining human behavior as principled adaptation to ecological structure—and practical machine learning by yielding state-of-the-art performance on real-world benchmarks.

1. Theoretical Foundations and Definitions

Ecological rationality posits that cognitive mechanisms are adapted to environmental statistics; ERMI formalizes this by integrating ecological grounding with normative rational analysis (Jagadish et al., 28 Aug 2025). Instead of relying on hand-crafted priors or fixed heuristics, ERMI leverages meta-learning to discover inference rules from data reflecting naturalistic distributions. Meta-learning here refers to training a model—often a transformer-based in-context learner or a mixture-density network—across a vast ensemble of tasks, each capturing real-world statistical regularities.

Fundamental to ERMI is the transition from standard rational analysis, which designs optimal strategies from explicit models, to an ecological regime where such models are learned implicitly using large-scale synthetic or real data. For instance, in function and category learning experiments, tasks are generated using LLMs so their feature and outcome distributions mimic those encountered by humans in real settings (Jagadish et al., 28 Aug 2025, Jagadish et al., 2 Feb 2024). This process injects "ecological priors" into the task pool, ensuring that the meta-learned inference strategy aligns with true environmental statistics.

2. Meta-Learning Methodologies for Ecological Priors

ERMI methodology is characterized by a two-stage procedure:

Ecological Task Generation: Cognitive tasks (e.g., function learning, category assignment, decision making) are generated via LLMs using carefully designed prompts. The resulting data include feature-label pairs with realistic correlations, nonlinearity, sparsity, and difficulty profiles (Jagadish et al., 28 Aug 2025, Jagadish et al., 2 Feb 2024). These synthetic tasks match the ceiling accuracies, inter-feature correlations, and sparsity observed in empirical benchmarks.
Meta-Learning Across Task Distributions: Models are meta-trained on these ecologically valid tasks using objectives such as:

$\ell = \sum_{t} -\log p_{\theta}\Bigl(y_{t}\mid x_{1:t}, y_{1:t-1}\Bigr)$

Here, $p_{\theta}$ denotes the model’s conditional predictive distribution, with $\theta$ as its parameter vector. The process forces the learner to adapt not just to individual data points, but to the overall regularities present in the full ecological task distribution (Jagadish et al., 28 Aug 2025, Binz et al., 2023).

Post-training, the ERMI network performs in-context adaptation—modifying its internal state without explicit weight updates—thereby generalizing rapidly to new, structurally similar problems. In the inference primitives context (e.g., MCMC proposals), compositional block-wise neural proposal networks are meta-trained over recurring motifs, enabling flexible application to novel unseen graph structures (Wang et al., 2017).

3. Algorithms, Architectures, and Statistical Regularities

Neural Proposal Networks and White-box Algorithms: Mixture density networks or blockwise proposals can be meta-trained to approximate Gibbs conditionals for frequently occurring model motifs, yielding reusable inference primitives (Wang et al., 2017). In probabilistic programming, white-box meta-learned inference extracts structure from program code and applies dedicated neural modules per command, estimating posteriors and marginal likelihoods with high test-time efficiency (Che et al., 2021).
In-Context Sequence Models: Transformer or RNN-based models operate as meta-learners, processing sequences of input–output pairs to enable rapid generalization within new domains. This approach subsumes traditional Bayesian inference as a special case, with meta-learned models directly approximating posterior predictives:

$p(x_{t+1}~|~x_{1:t}) = \int p(x_{t+1}| \mu ) p(\mu | x_{1:t}) d\mu$

with learning objectives that directly maximize predictive log-likelihood over the ecological task ensemble (Binz et al., 2023, Jagadish et al., 28 Aug 2025).

Statistical Regularity Internalization: Empirical studies confirm that LLM-generated tasks recapitulate benchmark data feature distributions—e.g., a predominance of linear mappings with positive slopes in function learning, sparsity in feature relevance for categorization, and realistic cue weights in decision making (Jagadish et al., 28 Aug 2025, Jagadish et al., 2 Feb 2024). The meta-training process internalizes these regularities into “ecological priors,” optimizing adaptation and generalization out-of-distribution.

4. Empirical Validation and Comparison

ERMI demonstrates superior predictive performance and behavioral fidelity across multiple domains:

Domain	ERMI Performance	Benchmarked Against
Function Learning	Accurate interpolation, human-like extrapolation, lower MSE in extrapolation bias	Meta-learned models with linear priors (Jagadish et al., 28 Aug 2025)
Category Learning	Human-mimetic learning curves, task difficulty sensitivity, exemplar-strategy shift	RMC, GCM, PM, rule-based models, PFN, MI (Jagadish et al., 2 Feb 2024, Jagadish et al., 28 Aug 2025)
Decision Making	Adaptive weighting, quantitative model frequencies matching observed trial-level strategies	Single-cue, equal-weight, feedforward neural networks (Jagadish et al., 28 Aug 2025)

Cognitive Task Experiments: ERMI captures human trial-by-trial choice patterns in 15 experiments, outperforming established cognitive models (e.g., RMC, GCM, rule-based, prototype-based, and feedforward networks) by both Bayesian model frequency and prediction accuracy (Jagadish et al., 28 Aug 2025, Jagadish et al., 2 Feb 2024).
Machine Learning Benchmarks: ERMI achieves state-of-the-art performance on OpenML-CC18 classification, with mean accuracy ~70.95% and mean rank 2.26, surpassing tabular architectures like XGBoost, TabPFN, and task-specific prior-fitted networks (Jagadish et al., 2 Feb 2024).
Generalization and Robustness: ERMI models successfully adapt to new unseen tasks, maintain robust performance under data scarcity, and avoid overfitting through exposure to high-entropy ecological priors.

5. Data Efficiency, Resource Constraints, and Ecological Rationality

Ecological rationality within ERMI is expressed by maximizing predictive performance under resource constraints (limited examples, distributed task structure). Meta-learning theory reveals a dichotomy: for classes with finite dual Helly number, a bounded number of samples per task suffices for vanishing error in downstream generalization as the number of tasks increases (Alon et al., 26 Nov 2024). The learning surface is characterized by:

$\epsilon^{ERM}(n, m) = O(1/n + (\log m)/m)$

with $n$ as number of tasks and $m$ as per-task samples.

Implications include:

The need for aggregating diverse task experiences over amassing data in individual contexts.
Feasibility of few-shot adaptation in domains such as robotics, personalized medicine, and in-context neural networks, provided the meta-hypothesis family has bounded complexity.
“Resource-rationality” emerges as a consequence of meta-training on naturally constrained data distributions, allowing the system to operate efficiently under ecological limitations (Binz et al., 2023, Alon et al., 26 Nov 2024).

Experimental results in ecological system prediction show enhanced accuracy and robustness using five to seven times less data than traditional machine learning methods when leveraging meta-learned attractor reconstruction; architectures employing time-delayed feedforward networks further improve data efficiency in nonlinear population and climate models (Zhai et al., 2 Oct 2024).

6. Implications for Cognitive Modeling and Future Directions

ERMI reframes human cognition as adaptive evidence integration over ecological structure rather than rule-based heuristic application (Jagadish et al., 28 Aug 2025, Binz et al., 2023). Recurrent architectures trained via meta-learning are shown to numerically approximate Bayes-optimal agents and implement evidence updating as a fixed point of the meta-learning dynamics (Mikulik et al., 2020). This approach generalizes across prediction, exploration, and decision tasks, matching empirical human data and extending rational analysis to "large world" settings where explicit Bayesian modeling is intractable.

Key implications for future research:

Integration of contextual information and uncertainty weighting, producing predictions with uncertainty decaying as $O(1/N)$ (Maeda et al., 2020).
White-box inference algorithms offer fast and scalable posterior estimation by leveraging program structure, with advantages over traditional MCMC and HMC on challenging multimodal posteriors (Che et al., 2021).
Ecologically rational meta-learned inference represents a flexible modeling paradigm for complex systems with limited observability, including ecological, biomedical, and cognitive domains.

In summary, ERMI models internalize ecological priors via meta-learning, delivering inference procedures that are both theoretically optimal and environmentally adapted. This synthesis advances understanding in both cognitive science and machine learning, offering principled, scalable inference strategies in realistic settings where classical approaches may fail.