Equilibrium World Models

Published 22 Jun 2026 in econ.GN | (2606.23463v1)

Abstract: We introduce \emph{Equilibrium World Models} (EWMs), a deep-learning method for globally solving dynamic stochastic models that feature rare disasters, binding constraints, and counterfactual states. Standard unsupervised neural-network-based solvers impose equilibrium conditions only on states generated by their own simulated policy. Their solutions can therefore be self-confirming: accurate on the simulated path, but untested off it, sensitive to initialization, and costly when expectations must be recomputed at each step. EWMs change the computational representation, not the economics. They enforce the model's exact equilibrium conditions on a broader, model-generated distribution of ordinary, rare, stressed, and counterfactual states. They carry the continuation with a learned surrogate, but certify the resulting policy strictly against the true equilibrium conditions. We provide an error decomposition, an off-path residual bound, and a convergence result linking self-confirming solutions to rational-expectations equilibria. We demonstrate EWMs through a sequence of test cases that isolate the main pathologies of classical deep-learning solvers and then scale them to richer economies. In a rare-disaster Brock--Mirman laboratory, coverage reduces disaster-region residuals by an order of magnitude. In a high-dimensional international real-business-cycle model, classical deep-learning solvers fail from all random starts, whereas EWMs converge from nearly all and evaluate continuations up to two orders of magnitude less often. When actions move transition measures, EWMs use action-conditioned continuations to recover the relevant policy margin. In a heterogeneous-agent economy with aggregate risk, EWMs compress the numerical representation of the wealth distribution by at least 25x while imposing exact full-distribution rational-expectations conditions.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces Equilibrium World Models to enforce equilibrium conditions in dynamic stochastic economies globally, addressing off-path errors.
It uses a novel coverage measure combining ergodic, stress, and local perturbation samples to rigorously certify solutions beyond typical trajectories.
Empirical evaluations show significant improvements, including order-of-magnitude error reductions and up to 130× computational savings in high-dimensional models.

Equilibrium World Models: Certifiable Global Solutions for Dynamic Stochastic Economies

Motivation and Problem Formulation

Dynamic stochastic economic models underpin much of modern macroeconomic and financial analysis, especially when assessing the impact of rare events, binding constraints, and heterogeneous agents' responses. However, classical deep-learning-based solution techniques, including Deep Equilibrium Nets (DEQN), predominantly certify accuracy only along the simulated path induced by their own policy. As a result, these solvers often deliver "self-confirming" solutions: the equilibrium conditions are enforced only where the model's own simulation trajectory visits, leaving rare, counterfactual, and stress-test states uncertified and potentially erroneous. This lack of global certification is crucial in applications where welfare, policy, and asset pricing hinge on off-path behaviors (e.g., after crises or regime changes).

The Equilibrium World Model Architecture

Equilibrium World Models (EWMs) are introduced to address these path certification failures while preserving structural economic discipline. The method operates within the exact structural model, never approximating or learning the transition law $\Gamma$ . Rather, EWMs enforce the model's equilibrium conditions---including optimality, complementarity (KKT), and market-clearing equations---on a model-generated coverage measure that deliberately spans ordinary, rare, stressed, perturbed, and counterfactual regions of the state space, well beyond the typical ergodic support.

This expanded enforcement set is constructed via a mixture of:

Ergodic (on-policy) path samples
Stress components (states with rare shocks or constraints forced active)
Local perturbations (small neighborhoods around the ergodic manifold)

The coverage measure is never a naive hypercube; every off-path state is generated or advanced via the model's exact law of motion.

Figure 2: Stylized schematic of the EWM coverage measure, showing the union of ergodic, stress, and local components structurally generated from the exact transition.

EWMs further utilize a learned surrogate (world model) for the expensive continuation terms required by the Euler equations, allowing for computational amortization while keeping accuracy explicitly audited against the true expectation.

Theoretical Framework: From Self-Confirmation to Rational Expectations

A key theoretical contribution is formalizing the residual decomposition on arbitrary test measures, separating the total equilibrium residual into:

Optimization error (OptErr): In-coverage residual minimized during training
Perception-class error (ApproxErr): Due to function approximation limits of the world model
Surrogate-fit error (SurrFitErr): Due to the mismatch between the surrogate and the true continuation
Coverage error (CoverageErr): The total variation (TV) distance between where the solution is certified and where it is evaluated

Crucially, the dominant error off-path for classic solvers stems from the coverage term, which EWMs directly eliminate by design. In the limit of increasing coverage reach and world model capacity/amortization, EWMs converge to rational expectations equilibria on the audited region.

Empirical Evaluation: Numerical Experiments and Diagnostics

EWMs are evaluated through a series of experiments that isolate specific pathologies of pathwise (DEQN-type) solvers and demonstrate scalable, certifiable global solutions in both low- and high-dimensional settings.

Brock–Mirman with Rare Disaster

Introducing a persistent, rare disaster and investment irreversibility into the classical Brock–Mirman model yields a pronounced coverage gap: a pathwise solver is highly accurate in normal regions but can be ten times worse in the disaster region, failing precisely where policy analysis is most economically significant. Imposing equilibrium conditions via coverage states generated under rare disasters reduces the disaster-region residual by an order of magnitude.

Figure 4: Disaster-regime residuals in Brock–Mirman with rare disaster. Coverage-based certification sharply reduces off-path errors relative to pathwise training.

International Real Business Cycle (IRBC) Model

In a $N$ -country IRBC setting with dimensions up to $N=32$ (state dimension $65$), classic pathwise solvers consistently fail to converge from random initializations and deliver large errors in disaster regimes. By contrast, EWMs:

Achieve precisely verified disaster-region residuals at the same order of magnitude as on-path errors
Converge reliably from nearly all random seeds (e.g., $10/10$ at $N=4$ )
Cut per-step continuation evaluation costs by up to $130\times$ due to amortization via the surrogate world model
Match or surpass the accuracy of surrogate-free coverage controls at a fraction of computational cost
Figure 6: Seed-level certification results at $N=2$ and $N=4$ , highlighting the robust, diagonal convergence of EWMs across random initializations.

Figure 1: As world model surrogate capacity increases, both the surrogate gap and the disaster residual in IRBC decline monotonically, demonstrating the efficacy of the amortized continuation.

Heterogeneous-Agent (Bewley) Economy

In a model where the aggregate state is the entire cross-sectional wealth distribution, EWMs with a distributional encoder replace hand-picked moments (as in Krusell–Smith) with a learned embedding, while still evaluating and enforcing equilibrium on the full distribution. Among policies passing the same exact equilibrium audit, only the learned embedding accurately recovers the decision-relevant cross-sectional shape, as revealed by linear probing of key wealth statistics.

Figure 3: The learned embedding maintains critical distributional features (e.g., variance, Gini coefficient) needed for accurate macroeconomic decision making, unlike hand-crafted moment-based summaries.

Identification of Action-Conditioned World Models

The flexibility of EWMs to implement action-conditioned surrogates is crucial when the policy alters the probability law of future regimes (e.g., endogenous disaster protection). In such cases, only an action-conditioned world component can identify and capture the relevant policy margin, whereas a state-only model collapses the effect to zero.

Figure 5: In an endogenous-protection setting, only action-conditioned world models recover the correct normal-regime protection value; state-only surrogates fail to identify the margin entirely.

Practical and Theoretical Implications

EWMs enable global certification of equilibrium solutions in high-dimensional, nonlinear models with rare events, strong nonlinearity, and distributional state variables, all while imposing no approximation or learning on the structural transition or equilibrium residuals. The implications include:

Robust policy analysis: Certifiable solutions are available even under rare shocks or in the tails, critical for regulatory, welfare, and risk evaluations.
Computational tractability: Amortization of continuation computations enables global solves at scales ( $N=32$ ) previously unattainable under exact integration.
True off-path certification: The explicit separation and control of the coverage error provide an auditable guarantee of solution quality in all audited regions.
Flexibility in economic structure: The architecture is modular to cover heterogeneous agents, non-Markov aggregate states, and actions affecting probability laws.

Theoretical results provide sufficient conditions for consistency to the rational expectations solution and clarify the precise role of coverage (support expansion) and world model capacity in closing the certification gap.

Prospects for Future Development

Open directions include:

Adaptive and active coverage: Designing procedures that tailor the coverage measure automatically in response to diagnostics, rather than manual schedule design.
Sharper residual and audit diagnostics: Extending the certification to richer classes of counterfactuals, including continuous-time and path-dependent settings, and developing sharper off-support error controls.
Extension to richer economic and financial models: Integrating EWM with mean-field games, extensive heterogeneous-agent settings, or continuous action and state spaces.

Conclusion

Equilibrium World Models present a principled, certifiable approach to solving complex dynamic stochastic general equilibrium models under rare disasters, constraints, and high-dimensional state spaces. By expanding the domain of certification and providing computational amortization without sacrificing structural discipline, EWMs offer a robust foundation for economic policy evaluation and large-scale simulation. The key lesson is that reliable, off-path accuracy in policy-relevant regions is neither a given nor a natural consequence of flexible function approximation: explicit certification via model-based coverage and disciplined world modeling is both necessary and feasible in practice (2606.23463).

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

This paper introduces Equilibrium World Models (EWMs), a new way to use deep learning to solve complex economic models more reliably. These models are about how the economy changes over time, especially in unusual times like financial crises or big disasters. The main idea is to make sure the computer’s solution works not only in “normal” situations, but also in rare or stressful ones—because that’s exactly when we care most about what the model says.

The big questions

The authors ask:

How can we train a model so it’s accurate not only on the usual path the economy follows, but also in rare, risky, or “what if” situations (like a disaster or a policy change)?
How can we keep the strong economic rules of the original model (so we don’t “cheat” with a shortcut), while making the computation fast and stable?
Can we prove that this approach moves us from a “self-confirming” solution (works only where it was trained) toward a full “rational expectations” solution (works wherever the model says the economy could realistically go)?

How do they do it?

Think of a driving school. A basic simulator might only practice smooth highway driving. You’ll look great there, but you might fail on ice. EWMs say: let’s also practice on icy roads, heavy rain, and emergency stops—situations the driver might rarely see, but must be ready for. And importantly, the simulator follows real physics, not a made-up game.

Here’s how the method works in simple terms:

The coverage idea: Train where it matters, not just where you’ve been

Usual deep-learning solvers only train on states the model itself visits in a normal simulation. That’s like only practicing sunny driving.
EWMs deliberately generate a wider set of states using the model’s own rules of motion (its “physics”): the ordinary states, plus rare, stressed, and locally perturbed states, and counterfactuals (what would happen if a big shock hit, or a policy changed?).
The model is then forced to satisfy the original economic equations on this larger “coverage” set. This is key: the economic rules are enforced exactly, just on a broader, carefully chosen set of situations.

Why this matters: Training only on your own path can create a “self-confirming” solution: it looks right where you checked it, but may fail where you didn’t. Expanding coverage fixes that by testing where you will actually read the solution—like disaster regions and policy counterfactuals.

Looking ahead cheaply: A learned surrogate for “the future”

These models need to look into the future (for example, “What is my expected payoff next period?”). Doing that exactly at every training step is expensive.
EWMs learn a small helper function (a surrogate) that quickly predicts this “look-ahead” part. This saves time, like using a fast calculator instead of redoing a long sum each time.
Crucially, they still audit the helper against the exact model. The surrogate speeds things up but never changes the economic rules or the final accuracy check.

When actions change the future’s randomness: Condition on the action

Sometimes choices today change not just tomorrow’s level, but the probabilities of future outcomes (for example, paying for protection that reduces disaster risk).
If you only look at state variables (like “how much capital is there?”), you miss this margin. EWMs fix this by making the look-ahead depend on both the state and the action—similar to a Q-function in reinforcement learning—so the model “sees” the real choice it has to make.

Many people at once: Compressing a whole distribution

In models with many households, the state is the entire wealth distribution—too big to handle directly.
EWMs learn a compact summary (an “encoder”) of the distribution that keeps what matters for decisions.
But they still push forward the full distribution using the exact model and test the full rational-expectations conditions. So the summary helps the computer, while the economics remains exact.

Always audited against the true model

The transition rules (how the world moves), the economic conditions (optimality, constraints, and market clearing), and the final accuracy measure are the exact ones from the original model.
The learned parts are tools to make training affordable and robust. The final “grade” is always based on the exact equations, on held-out states the model didn’t practice on.

What did they find?

The authors test EWMs in three “microscope” models designed to isolate common failure points of standard deep-learning solvers.

Here is a brief summary of the results:

Rare-disaster growth model (Brock–Mirman with irreversible investment):
- Problem: After a rare productivity crash, an investment constraint binds—exactly where standard training hasn’t practiced much.
- Result: EWMs, by enforcing the equations in disaster regions, cut errors there by about a factor of 10. A simple “coverage-only” version (without the learned future surrogate) gets almost all the gain, showing that wider training support is the main lever.
High-dimensional international business cycle model (many countries, rare disasters, irreversible investment):
- Problem: The state has many dimensions; rare disasters are off the usual path; constraints create kinks that are hard to extrapolate to.
- Result: Classical deep-learning solvers failed from all random starts in the reported setup; EWMs converged from nearly all starts. They also needed far fewer expensive “look-ahead” evaluations (up to 100 times fewer), and kept errors small in disaster regions. Economically, the EWM solutions consistently priced disaster risk, while the pathwise solutions mispriced it where they weren’t certified.
Diagnostic variant with protection choice (actions change transition probabilities):
- Problem: If the future’s randomness depends on your action (like buying protection), a state-only look-ahead ignores the key decision margin.
- Result: A state-only surrogate collapses the protection choice to zero; an action-conditioned surrogate recovers the correct behavior and passes the exact residual audit.
Heterogeneous-agent model with aggregate risk (state is the wealth distribution):
- Problem: The state is a whole distribution—huge to compute with.
- Result: The learned encoder compresses the distribution by at least 25× while still satisfying the exact full-distribution rational-expectations conditions during auditing. With the given compute budget, only the learned summary recovers the decision-relevant shape of the cross-section better than handpicked moments.

Theory highlights:

They decompose the error into parts, including a “coverage” term that captures how far training support misses the states you care about.
They prove an off-path residual bound: accuracy off the usual path can be guaranteed by quantities you can measure on the coverage set.
They show a convergence path: as coverage expands and approximation errors shrink, solutions move from self-confirming (only right on their own path) toward rational expectations (right on the reachable region you certify).

Why it matters

Better crisis analysis: Policies are often judged in rare, stressful times. EWMs train and certify solutions right where those decisions matter.
Reliability over speed: EWMs are built to be robust—less dependent on lucky starts, more likely to converge, and cheaper to run at scale—without changing the underlying economics.
Clearer certification: The method states exactly where the solution is valid (the coverage set), and proves accuracy with the model’s own equations. No “black box” shortcuts define the solution.
Scales to richer models: EWMs handle many countries, rare disasters, binding constraints, and even full wealth distributions by combining exact economics with smart computational tools.
A bridge to rational expectations: By widening coverage, EWMs provide a practical, provable way to move from “works where you trained” to “works where the economy could go,” which is the standard economists aim for.

In short, EWMs keep the economics exact but change where and how the computer practices, so the final solution is trustworthy not just on sunny days, but also in the storms economists care about.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a consolidated list of concrete gaps and unanswered questions highlighted or implied by the paper. Each item is posed to guide future research toward targeted, testable advances.

Coverage design and tuning:
- How should the coverage measure μ_κ be designed in a principled, model-agnostic way (choice of stress seeds, local perturbation kernels, and mixing weights ρ_i) to maximize certification where policies will be queried?
- How to set and adapt the reach parameter κ online (e.g., curriculum/annealing schedules) to balance optimization stability, off-path certification, and computational cost?
- How to ensure coverage excludes unreachable or economically irrelevant states yet still spans binding-constraint and tail regions that matter for counterfactuals?
- What diagnostics reliably indicate inadequate coverage (e.g., residual anisotropy, tail failures) and how to adaptively enrich μ_κ in response?
Convergence theory and guarantees:
- Precise assumptions under which the coverage-indexed fixed point converges to a rational-expectations equilibrium on the audited region (e.g., contraction, monotonicity, single-valuedness) are not fully characterized; what classes of models satisfy them, and what are the necessary vs. sufficient conditions?
- Finite-capacity and finite-sample guarantees are missing: what convergence rates hold given bounded network size, finite training iterations, and quadrature discretization?
- How does non-smoothness from occasionally binding constraints (kinks) affect convergence, especially with Fischer–Burmeister residuals and gradient-based training?
- In the presence of multiple equilibria, what selection is induced by different coverage schedules/initializations, and can selection be controlled?
Off-path residual bounds and economic accuracy:
- The off-path residual bound’s tightness, dependence on the distance between enforcement and target measures, and its translation into economically meaningful errors (welfare losses, asset-pricing errors, policy mistakes) remain unquantified.
- How do residual-based certificates map to bounds on decision rules and outcomes (e.g., sup-norm or Lipschitz constants connecting residuals to policy/value errors)?
Surrogate continuation W (state-only and action-conditioned):
- When is an action-conditioned surrogate necessary, and can one provide an operational test to detect action–distribution coupling that invalidates state-only surrogates?
- How does surrogate approximation error propagate into the structural residual and ultimately into policy and price errors? Can one derive certified error budgets that combine surrogate and coverage errors?
- What active-learning or uncertainty-quantification strategies (e.g., Bayesian surrogates, variance-based acquisition) best target expensive exact-continuation evaluations to tighten guarantees?
- How to scale action-conditioned surrogates with high-dimensional action spaces and market-clearing variables without exploding sample complexity?
Encoder for distributions (heterogeneous-agent economies):
- What conditions ensure a learned low-dimensional embedding is sufficient for rational-expectations decisions (i.e., preserves all decision-relevant information), and how can sufficiency be tested or certified ex post?
- How stable is the encoder across policy iterations and coverage shifts (distribution drift)? Does the embedding generalize after policy updates or structural shocks, or does it require periodic retraining?
- How to choose the encoder dimension systematically (bias–variance–compute trade-off) and interpret encoder features economically?
- What are failure modes where the encoder collapses decision-relevant heterogeneity (e.g., fat-tailed wealth distributions, mass points at constraints)?
Robustness and stability:
- Despite improved convergence, how sensitive is EWM to random seeds, optimizer hyperparameters, and network architectures relative to pathwise baselines? Are there principled tuning rules?
- Does enforcing residuals over broader coverage degrade on-path accuracy or induce trade-offs (frontier between on-path and off-path performance) that should be explicitly managed?
Stress weighting and rare events:
- How should rare/stress regions be weighted in μ_κ relative to their true probabilities to balance certification with economic relevance (e.g., pricing of disaster risk vs. average performance)?
- How do different stress-weighting schemes affect asset pricing, welfare calculations, and policy evaluations produced by the certified policy?
Auditing and validation:
- Current audits evaluate exact residuals on held-out states drawn from the (designed) coverage. How should audits be expanded to include:
- target ergodic distributions under the learned policy,
- stress tests under alternative policies/counterfactuals,
- adversarial or worst-case coverage perturbations?
- Benchmarks against known global solutions are shown for small models; systematic quantitative validation in larger models (beyond residuals) remains to be developed.
Scalability and compute:
- What are the asymptotic compute and memory costs as state dimension, shock dimension, and the number of countries/agents grow, especially with many quadrature nodes and action-conditioned surrogates?
- How do coverage generation and world-component training overheads scale vs. saved continuation evaluations, and where are the break-even points?
Handling constraints and non-smoothness:
- The Fischer–Burmeister transform is Lipschitz but non-differentiable at the origin; how does this affect training dynamics and gradient quality in large systems with many complementarity pairs?
- Are alternative smoothing or penalty formulations preferable in practice, and how do they affect certification and convergence?
Applicability beyond the maintained setting:
- The method assumes a known, exact structural transition Γ. How can EWM be extended to:
- models with parameter uncertainty (robust or Bayesian EWM),
- partially unknown transitions (semi-structural world models),
- misspecified environments (link to Berk–Nash or structural learning)?
- Extensions to continuous-time settings, models with path-dependent state (e.g., habits, durable capital adjustment costs), and nested expectations (e.g., Epstein–Zin) are not addressed.
Dynamic games and equilibrium selection:
- How does EWM extend to dynamic games with strategic interactions, belief consistency across agents, and multiple equilibria? Can coverage be designed to select equilibria with desirable properties?
Integration with optimal policy (Ramsey) analyses:
- How to couple EWM with an outer optimization over policy instruments (bilevel problems) while maintaining coverage-based certification and computational stability?
Box/hypercube sampling as placebo:
- The paper argues boxes fail in higher dimensions; a systematic empirical characterization of the dimension threshold where structurally generated coverage overtakes uniform-box sampling is not provided.
- Can hybrid strategies (structural seeds + stratified box samples) offer practical benefits in moderate dimensions?
Practical implementation choices:
- Clear guidance is missing on:
- choosing weights w_i across residual blocks,
- selecting and discretizing quadrature rules (nodes, accuracy),
- normalizing/rescaling states and residuals for conditioning,
- setting early-stopping and tolerance criteria that align with economic error targets.
Generalization and distribution shift:
- How do policies certified on μ_κ perform under shock distributions that differ from training (e.g., heavier tails, structural regime changes)? Are there principled ways to make certification robust to such shifts?
Avoiding enforcement on infeasible/unreachable states:
- While stresses are generated via Γ, local perturbations risk entering states that are economically infeasible or unreachable in finite time. How to guarantee coverage respects feasibility and maintains policy plausibility?
Diagnostics beyond residuals:
- Residuals are necessary but not sufficient. Standardized reporting of induced errors in prices, risk premia, Euler equation wedges, impulse responses, and welfare would provide a clearer link to economic significance.
Interplay with sequence-space and alternative solvers:
- Can EWM be combined with sequence-space conditioning (stabilizing exogenous shocks) or other residual-learning variants? What are principled hybrid designs and when do they dominate?
Non-stationary and learning economies:
- For models with adaptive beliefs or evolving structures (beyond rational expectations), how should coverage and world components be adapted to maintain meaningful certification?

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

The following use cases can be deployed with current modeling workflows by extending existing deep-learning residual solvers (e.g., DEQN) to include coverage design, audited continuation surrogates, and held‑out residual certification.

Finance and Insurance

Robust disaster‑risk asset pricing and risk premia estimation — sector: finance
- What: Use EWMs to compute prices and premia in models with rare disasters, leverage cycles, and occasionally binding constraints; certify off‑path (tail) states that drive valuations.
- Tools/workflows: EWM plugin for risk engines that (i) designs a coverage schedule over stressed regimes, (ii) trains an action‑conditioned continuation surrogate for expectations, (iii) reports held‑out exact residuals over rare/counterfactual states.
- Assumptions/dependencies: A well‑specified structural macro‑finance model; calibrated shock processes; sufficient compute for quadrature and training; coverage must include price‑relevant tails; audit valid only on the covered region.
Bank/insurer stress testing under macro‑financial disasters — sectors: finance, insurance
- What: Replace pathwise certification with coverage‑certified stress scenarios for solvency, liquidity, and capital planning; ensure policy functions are valid in disaster regimes relevant for CCAR/SREP/ORSA‑like exercises.
- Tools/workflows: “Coverage designer” that mixes ergodic, rare, and locally perturbed states; off‑path residual dashboards integrated into supervisory reporting.
- Assumptions/dependencies: Structural balance‑sheet dynamics; governance for model risk; regulator acceptance of structural audits; high‑quality shock discretization.
Reinsurance and catastrophe‑risk capital allocation — sector: insurance
- What: Dynamic capital and reinsurance design in models with low‑frequency, high‑severity shocks; certify performance in stressed states (post‑event capital scarcity, binding constraints).
- Tools/workflows: EWM‑based optimizer with action‑conditioned continuation (reinsurance choice shifts future loss distribution).
- Assumptions/dependencies: Structural loss dynamics; exposure/correlation modeling; action affects transition measure (e.g., retention changes loss tail).

Public Policy and Central Banking

Counterfactual monetary and macroprudential policy in rare regimes — sector: public policy
- What: Evaluate policy when constraints bind unevenly (e.g., ZLB, credit frictions); certify generalized impulse responses and policy counterfactuals off the ergodic path.
- Tools/workflows: EWM extension of DSGE/HANK solvers with target coverage over tail states and an audit that reports exact residuals on held‑out counterfactuals.
- Assumptions/dependencies: Trusted structural model; calibrated shocks; adequate coverage of policy‑relevant states; institutional buy‑in for audit metrics.
Distributional effects of monetary/fiscal policy (compressed HANK) — sectors: public policy, academia
- What: Use learned encoders (JEPA‑style) to compress the wealth distribution while enforcing exact full‑distribution rational‑expectations conditions; quantify distributional impacts at tractable cost.
- Tools/workflows: Encoder–predictor training paired with exact forward population transitions and held‑out residual audits; ≥25× distribution compression demonstrated.
- Assumptions/dependencies: Encoder captures decision‑relevant distributional features; calibration data on heterogeneity; audit remains on full distribution (not just compressed state).

Climate and Energy

Climate‑disaster policy analysis and adaptation design — sectors: climate, energy, public policy, insurance
- What: Compute social cost of carbon, adaptation investment, and resilience policy in models with rare climate shocks; certify tails where welfare and prices are sensitive.
- Tools/workflows: EWM with action‑conditioned continuation (adaptation/protection reduces future hazard probabilities); stress‑coverage over extreme climate states.
- Assumptions/dependencies: Structural IAM or sectoral climate model; calibrated hazard processes; representation of actions that shift transition measures; coverage must include extreme events.
Power‑system investment under rare outages and irreversibility — sector: energy
- What: Plan capacity/renewables/storage with irreversible investments under rare outage shocks; ensure policies are valid around binding constraints and stressed grid states.
- Tools/workflows: EWM‑enabled dynamic planning model; surrogate‑amortized expectations; off‑path residual audits for low‑probability outage regimes.
- Assumptions/dependencies: Structural demand/supply and outage processes; discretization for shocks; high‑dimensional state handling via coverage rather than uniform grids.

Operations, Supply Chain, and Cyber Risk

Resilience investment that shifts future risk distributions — sectors: supply chain, cybersecurity, operations
- What: Optimize diversification, inventory buffers, or cyber controls when actions change disruption/breach probabilities; recover policy margins that state‑only continuations miss.
- Tools/workflows: Action‑conditioned continuation (Q‑function analogue) with coverage over disruption tails; certified counterfactuals for stress scenarios.
- Assumptions/dependencies: Structural transition where controls alter hazard rates; calibrated shock models; adequate coverage of rare disruptions.

Software and Research Infrastructure

EWM add‑on for existing neural solvers — sectors: software, academia
- What: Integrate coverage measures, continuation surrogates, and held‑out audits into DEQN‑style pipelines; reduce continuation evaluations by 10–100× while improving tail certification.
- Tools/workflows: Open‑source EWM wrapper (JAX/PyTorch), coverage schedulers for reach κ, audit modules reporting max/mean residuals by region.
- Assumptions/dependencies: GPU/TPU resources; reproducible quadrature; careful coverage design; code verification against exact residual tests.
Teaching and replication kits — sector: education
- What: Classroom labs illustrating self‑confirming vs. rational‑expectations certification, coverage design, and off‑path audits using Brock–Mirman and IRBC testbeds.
- Tools/workflows: Ready‑to‑run notebooks with placebo (uniform‑box) vs. structural‑coverage comparisons.
- Assumptions/dependencies: Standard ML stacks; small‑scale models for didactics.

Long‑Term Applications

These opportunities require further methodological development, scaling, or institutional adoption before routine deployment.

Real‑time “structural digital twins” for economies — sectors: public policy, finance
- What: Always‑on EWM systems that ingest high‑frequency data to recalibrate shocks and run certified counterfactuals/stress tests in near real time.
- Tools/products: Central‑bank macro twin with coverage‑aware scenario generator and automated residual audits.
- Dependencies: Robust online calibration; streaming data integration; governance for model updates; compute orchestration.
Regulatory standards for off‑path residual audits — sectors: finance, insurance, public policy
- What: Incorporate coverage‑based held‑out exact residual metrics into supervisory model validation for stress testing and capital requirements.
- Tools/products: Audit protocols and benchmarks; standardized coverage libraries for sector‑specific rare events.
- Dependencies: Regulator consensus; transparency/interpretability tooling; industry adoption.
Large‑scale HANK with real‑time distributional forecasting — sectors: public policy, finance
- What: Deploy compressed‑distribution EWMs inside forecasting stacks to provide distribution‑aware projections and policy analysis at operational speed.
- Tools/products: JEPA‑style encoders co‑trained with macro solvers; dashboards for distributional counterfactuals.
- Dependencies: Stable encoder generalization; data on household finance; scalable population transitions; compute budgets.
Cross‑sector integrated climate–energy–macro models — sectors: climate, energy, finance, public policy
- What: Joint structural world model spanning macroeconomy, power system, and climate hazards with certified off‑path behavior for compound extremes.
- Tools/products: Multi‑module EWM with coordinated coverage across domains; action‑conditioned continuations for adaptation and market design.
- Dependencies: Interoperable structural modules; calibration across sectors; high‑dimensional state handling; governance for uncertainty.
Autonomous policy exploration and stress design — sectors: public policy, finance
- What: EWM‑driven exploration of policy spaces (taxes, capital buffers, macroprudential tools) to discover robust rules that remain certified in tails.
- Tools/products: Coverage‑adaptive curriculum that expands κ where residuals indicate gaps; policy search with action‑conditioned world components.
- Dependencies: Safe exploration constraints; interpretability; computational budgets; human‑in‑the‑loop oversight.
Enterprise risk “coverage analytics” platforms — sectors: finance, insurance, energy, supply chain
- What: Platforms that quantify where models are certified (coverage sets), highlight residual gaps, and guide additional data/simulations to close them.
- Tools/products: Coverage maps, residual heat‑maps, κ‑scheduling assistants, and automated re‑training pipelines.
- Dependencies: Integration with heterogeneous internal models; unified shock libraries; model‑risk governance.
Personal finance and retirement planning under macro tail risk — sector: fintech
- What: Long‑horizon planning tools that account for rare macro shocks using small structural models with EWM certification of tail behavior.
- Tools/products: Consumer apps exposing scenario ranges with certification badges for covered tails.
- Dependencies: Simplified yet credible structural models; explainability; regulatory compliance; user education.
Epidemic control and public‑health planning — sector: public health
- What: Dynamic policy models with actions (NPIs, vaccination campaigns) that shift infection dynamics; EWM ensures certification for rare outbreak regimes.
- Tools/products: Action‑conditioned continuations; stress‑coverage over super‑spreading/extreme waves; audited counterfactuals.
- Dependencies: Epidemiological structural validity; data quality; interagency coordination; compute.
Advanced operations planning under systemic shocks — sectors: logistics, manufacturing
- What: High‑dimensional multi‑region supply networks with irreversible investment and rare disruptions; certified policies for extreme nodes (port closures, geopolitical shocks).
- Tools/products: EWM‑based solvers with state compression and action‑conditioned transitions.
- Dependencies: Structural supply‑chain models; calibrated disruption processes; significant compute.
AI‑assisted modeling co‑pilots for economists — sector: software/education
- What: Assistants that propose coverage schedules, diagnose self‑confirmation, and auto‑generate held‑out audits for new models.
- Tools/products: IDE plugins for Dynare/QuantEcon/JAX; automated residual diagnostics and ablation studies.
- Dependencies: Standardized model interfaces; validation datasets; human oversight.

Notes on feasibility across applications:

Structural dependency: EWMs assume the transition law, equilibrium conditions, and shock processes are specified by the modeler; certification is meaningful only relative to that maintained model.
Coverage design: Benefits hinge on including the states where decisions, prices, or welfare are evaluated; outside covered regions, no guarantees apply.
Surrogate integrity: Continuation surrogates must be trained and audited against exact quadrature; when actions move transition measures, surrogates must be action‑conditioned to avoid collapsing margins.
Compute and calibration: High‑dimensional models require GPUs/TPUs and careful shock discretization; calibration uncertainty and model misspecification remain first‑order risks independent of computation.

View Paper Prompt View All Prompts

Glossary

action-conditioned continuation: A continuation (future-value) approximation that depends on both the current state and the chosen action, used when actions change the distribution of next-period states. Example: "EWMs use action-conditioned continuations to recover the relevant policy margin."
Berk--Nash equilibrium: An equilibrium concept where agents choose optimal beliefs within a possibly misspecified model and those beliefs are confirmed by the data generated by their own actions. Example: "Berk--Nash equilibrium, in which an agent's beliefs are optimal within a possibly misspecified model and confirmed by the data her own actions generate"
Brock--Mirman growth model: A classic stochastic dynamic model of capital accumulation used as a laboratory for studying solution methods in macroeconomics. Example: "textbook Brock--Mirman growth model"
complementarity condition: A set of inequalities involving a nonnegative multiplier and a nonnegative slack with a product equal to zero, expressing complementary slackness in optimization. Example: "A complementarity condition is a set of inequalities: a nonnegative multiplier $\mu\ge0$ , a nonnegative constraint slack $s\ge0$ , and the complementary-slackness requirement $\mu s=0$ "
complementary-slackness: The requirement in constrained optimization that the product of a constraint’s slack and its Lagrange multiplier is zero at optimum. Example: "the complementary-slackness requirement $\mu s=0$ "
coverage measure: A model-generated distribution of states (including rare and stressed ones) on which equilibrium conditions are enforced and audited. Example: "EWM instead trains and audits on a coverage measure generated from the model's own transition"
coverage set: The set (support) of states covered by the coverage measure where residual accuracy is evaluated. Example: "bound off-path accuracy by quantities observable on the coverage set"
curse of dimensionality: The exponential growth in computational cost and data requirements as the dimensionality of the state space increases. Example: "the curse of dimensionality obstructs grid-based representations"
Deep Equilibrium Nets: Neural-network-based solvers that train policies by minimizing equilibrium residuals along simulated paths without labeled data. Example: "Deep Equilibrium Nets train a policy network by minimizing squared structural residuals along simulated paths"
Euler-residual methods: Approaches that train policies by driving the Euler equation residuals toward zero, typically with neural networks. Example: "deep-learning Euler-residual methods"
Ergodic measure: The invariant long-run distribution of states visited under a given policy and stochastic process. Example: "the policy-induced invariant (ergodic) measure"
exact-residual audit: An evaluation of solution accuracy using the model’s exact equilibrium residual on held-out states not used in training. Example: "the held-out exact-residual audit"
Fischer--Burmeister function: A smooth function used to convert complementarity conditions into a single equality condition for residual-based optimization. Example: "Fischer--Burmeister function $\phi(a,b)=a+b-\sqrt{a^2+b^2}$ "
Gaussian-process: A nonparametric Bayesian modeling approach often used as a surrogate for expensive function evaluations. Example: "Gaussian-process or neural-network-based surrogates for expensive economic computations"
generalized impulse-response functions: Shock-response measures computed away from the ergodic path, often in nonlinear and non-Gaussian settings. Example: "generalized impulse-response functions"
generalized moments: Learned or hand-chosen summary statistics of high-dimensional distributions used to approximate state information in heterogeneous-agent models. Example: "learned generalized moments"
Hamilton--Jacobi--Bellman: A partial differential equation characterizing optimal control and value functions in continuous-time dynamic optimization. Example: "coupled Hamilton--Jacobi--Bellman, Kolmogorov-forward, and master equations"
heterogeneous-agent economy: A model with many agents who differ in state variables (e.g., wealth), often requiring distributional state representations. Example: "a heterogeneous-agent economy with aggregate risk"
Joint Embedding Predictive Architecture (JEPA): A representation-learning method that pairs an encoder with a predictor of the next-period embedding. Example: "trained as a Joint Embedding Predictive Architecture"
Karush--Kuhn--Tucker (KKT): Necessary conditions for optimality in constrained optimization problems, including stationarity, primal feasibility, dual feasibility, and complementarity. Example: "the Karush--Kuhn--Tucker (KKT) complementarity conditions"
Kolmogorov-forward: The forward equation governing the evolution of probability distributions over time in stochastic processes. Example: "coupled Hamilton--Jacobi--Bellman, Kolmogorov-forward, and master equations"
Krusell--Smith: A method that summarizes the distributional state in heterogeneous-agent economies with a small set of moments to approximate equilibrium. Example: "The classic Krusell--Smith approach summarizes that distribution by a few moments"
Lucas critique: The argument that policy evaluations should be based on structural models because reduced-form relationships change when policies change. Example: "the structural discipline emphasized by the Lucas critique"
mean-field: A modeling framework that studies economies with many agents by focusing on the limiting behavior of distributions and aggregate effects. Example: "mean-field formulations characterize values and distributions through coupled Hamilton--Jacobi--Bellman, Kolmogorov-forward, and master equations"
market-clearing conditions: Equations ensuring that aggregate supply equals aggregate demand in each market within the model. Example: "the Euler equations, the Karush--Kuhn--Tucker (KKT) complementarity conditions in Fischer--Burmeister form, and the market-clearing conditions, into a single vector"
master equations: Equations describing the evolution of value functions and distributions in mean-field games/economies, tying together micro and macro states. Example: "coupled Hamilton--Jacobi--Bellman, Kolmogorov-forward, and master equations"
parameterized-expectations methods: Techniques that approximate conditional expectations with parametric functions and find fixed points consistent with model dynamics. Example: "parameterized-expectations methods"
Q-function: In dynamic programming, a function that returns the expected return given a state-action pair, used to preserve action-dependent margins. Example: "the analogue of a Q-function in dynamic programming"
quadrature rule: A numerical integration method (with nodes and weights) used to approximate expectations in stochastic models. Example: "we fix a quadrature rule and identify the exact model with this finite discretization"
rational-expectations equilibria: Equilibria in which agents’ expectations are consistent with the model’s actual laws of motion and outcomes. Example: "unsupervised deep-learning solvers for rational-expectations equilibria"
self-confirming equilibrium: An equilibrium where beliefs are correct only on the path generated by the agent's own actions, leaving off-path beliefs unrestricted. Example: "self-confirming equilibria require beliefs to be correct on histories generated by the agent's own behavior, while off-path beliefs remain unrestricted"
structural residual: The vector of deviations from the model’s own optimality, complementarity, and market-clearing conditions used as the training and evaluation target. Example: "The equilibrium is characterized by a structural residual, the equilibrium conditions of the economic model written as deviations from zero"
transition law: The model’s structural mapping from current states and actions (plus shocks) to next-period states. Example: "the transition law, the equilibrium conditions, and the reported accuracy criterion remain the model's exact structural objects"
world model: A model of the environment used to simulate or rehearse imagined experience for planning or training without interacting with the real system. Example: "a world model lets an agent improve by rehearsing imagined experience inside a model of its environment"
“continuation surrogate”: A learned, cheap approximation to the conditional expectation term (continuation) in the Euler equation to amortize costly evaluations. Example: "a continuation surrogate amortizes the conditional expectation entering the Euler equation"

View Paper Prompt View All Prompts

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Equilibrium World Models

Summary

Equilibrium World Models: Certifiable Global Solutions for Dynamic Stochastic Economies

Motivation and Problem Formulation

The Equilibrium World Model Architecture

Theoretical Framework: From Self-Confirmation to Rational Expectations

Empirical Evaluation: Numerical Experiments and Diagnostics

Brock–Mirman with Rare Disaster

International Real Business Cycle (IRBC) Model

Heterogeneous-Agent (Bewley) Economy

Identification of Action-Conditioned World Models

Practical and Theoretical Implications

Prospects for Future Development

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

The big questions

How do they do it?

The coverage idea: Train where it matters, not just where you’ve been

Looking ahead cheaply: A learned surrogate for “the future”

When actions change the future’s randomness: Condition on the action

Many people at once: Compressing a whole distribution

Always audited against the true model

What did they find?

Why it matters

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Finance and Insurance

Public Policy and Central Banking

Climate and Energy

Operations, Supply Chain, and Cyber Risk

Software and Research Infrastructure

Long‑Term Applications

Glossary

Open Problems

Continue Learning

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research