What Capable Agents Must Know: Selection Theorems for Robust Decision-Making under Uncertainty

Published 3 Mar 2026 in cs.LG, cs.AI, cs.RO, q-bio.NC, and stat.ML | (2603.02491v1)

Abstract: As artificial agents become increasingly capable, what internal structure is necessary for an agent to act competently under uncertainty? Classical results show that optimal control can be implemented using belief states or world models, but not that such representations are required. We prove quantitative "selection theorems" showing that low average-case regret on structured families of action-conditioned prediction tasks forces an agent to implement a predictive, structured internal state. Our results cover stochastic policies, partial observability, and evaluation under task distributions, without assuming optimality, determinism, or access to an explicit model. Technically, we reduce predictive modeling to binary "betting" decisions and show that regret bounds limit probability mass on suboptimal bets, enforcing the predictive distinctions needed to separate high-margin outcomes. In fully observed settings, this yields approximate recovery of the interventional transition kernel; under partial observability, it implies necessity of belief-like memory and predictive state, addressing an open question in prior world-model recovery work.

Abstract PDF Upgrade to Chat

Authors (1)

Aran Nayebi

Summary

The paper demonstrates that achieving low average-case regret in structured tasks necessitates the emergence of predictive internal representations.
It details a reduction of predictive modeling to binary betting decisions, leading to provable bounds on transition kernel recovery and belief formation.
The results reveal that minimized regret forces agents to adopt modular, regime-sensitive architectures with decision-relevant internal states.

Selection Theorems for Internal Structure in Robust Decision-Making under Uncertainty

Motivation and Problem Statement

The paper "What Capable Agents Must Know: Selection Theorems for Robust Decision-Making under Uncertainty" (2603.02491) addresses the question: What internal representations are forced upon agents that achieve competent, robust decision-making under uncertainty? Classical results demonstrate that belief states and world models are sufficient for optimal control in POMDPs, but these results do not establish necessity. That is, optimality can be achieved via belief-like representations, but an agent may also succeed without modeling predictive structure if its tasks or evaluation regime do not demand it. The paper closes this gap by proving quantitative "selection theorems" that show low average-case regret in structured action-conditioned prediction tasks compels agents to instantiate predictive internal state.

Conceptual Framework and Technical Approach

The paper operates in the setting of sequential decision-making—both fully observed MDPs and partially observed POMDPs—with goal-conditioned evaluation protocols. It departs from previous work by not assuming worst-case optimality or determinism, and instead considers the average normalized regret achievable by stochastic policies across a family of diagnostic prediction tasks.

The central technical innovation is reducing predictive modeling requirements to families of binary "betting" decisions. Regret decomposition reveals that average-case performance bounds constrain the probability mass assigned to suboptimal betting branches. When evaluation regimes include large-margin tests (where predictive uncertainty is not trivial), agents are forced to refine their internal state to resolve the predictive distinctions necessary for effective decision-making. In fully observed environments, this yields approximate recovery of the transition kernel; in partially observed settings, it enforces belief-like memory and predictive representations, thereby formalizing when such structure is necessary rather than merely sufficient.

Key Results and Theorems

Fully Observed Environments: Transition Kernel Recovery

For MDPs where the agent observes true state and transitions, the paper constructs composite goal families that encode multi-attempt binary betting tasks. A central theorem demonstrates that, under a bound $\bar\delta$ on average case regret over diagnostic task families, a stochastic policy’s choices encode a “soft” estimator $\widehat P_{ss'}(a)$ of the transition kernel with provable mean absolute error:

$\mathbb{E}_{(s,a,s')}\left[|\widehat P_{ss'}(a) - P_{ss'}(a)|\right] \leq \frac{t_\gamma}{\sqrt n} + \frac{\bar\delta}{c(\gamma)} + O\left(\frac{1}{n}\right)$

where $t_\gamma, c(\gamma)$ are constants determined by the margin threshold $\gamma$ . This formalizes that competency over multi-step tasks requires encoding transition structure with precision increasing in task horizon.

The causal content is also characterized: satisfaction of these regret constraints yields recovery of the interventional kernel $P(S_{t+1} \mid S_t, \mathrm{do}(A_t=a))$ (Pearl’s Level 2 interventions) to within error determined by the average regret bound and structural mismatch. However, crucially, the paper proves that recovery of Level 3 (counterfactual) queries is not generically possible from these constraints—two SCMs can yield identical interventional kernels yet differ in their counterfactual couplings.

Partial Observability: Predictive-State and Memory Necessity

In partially observed settings, the betting reduction applies to predictive-state tests (PSRs), where the agent bets on future observation sequences under prescribed action trajectories. The main theorem states that, for any margin $\gamma$ , low global average-case regret forces the agent to place vanishing probability mass on suboptimal bets for large-margin tests—quantifying the degree of internal structure needed.

Moreover, the necessity of belief-like memory is proved via “no-aliasing” bounds: any memory statistic $M$ that aliases histories requiring opposite confident bets incurs unavoidable pairwise regret. Concretely, if an agent’s internal memory fails to distinguish histories which the optimal bets would separate with margin $\gamma$ , then average pair-regret is forced to be at least $q^{\mathsf{Alias}_\gamma(M)} c(\gamma)/2$ , where $q^{\mathsf{Alias}_\gamma(M)}$ is the mass of such aliasing events.

Structuring Internal Organization from Task Family

The paper further derives corollaries for modularity, regime sensitivity, and representational match:

Informational modularity: Block-structured task distributions select for modular memory representations that distinguish high-margin predictive tests within each block.
Regime tracking: Mixtures of task distributions select for persistent regime-tracking state; low regret across regimes compels memory to encode regime-sensitive distinctions.
Representational match: For vanishing-regret agents under a common margin-minimality requirement, their internal representations must agree up to invertible recoding on decision-relevant partitions, formalizing convergence of internal structure under shared competence constraints.

Practical and Theoretical Implications

These selection theorems establish that average-case regret bounds over structured predictive tasks impose quantitative, representation-theoretical constraints on agent architecture. The results connect capability with necessity: robust competence in uncertain, multi-regime environments selects for world modeling, belief-like memory, modularity, and persistent internal variables—not as optional architectural features, but as consequences of empirical task demands.

This provides a rigorous underpinning for the emergence of convergent internal representations in deep RL agents, world model architectures, and even biological neural systems. The theory aligns with empirical findings across visual, auditory, motor, and language modalities, as well as hypotheses in NeuroAI and cognitive science regarding the origin of modularity, global workspace, and unified predictive representations. The selection framework offers a principled explanation for observed representational alignments across artificial and biological systems, supporting modalities such as whole-brain data in animal studies evaluated against agent models.

From an engineering perspective, as AI systems become increasingly general and robust, the results suggest organizational signatures—predictive state, regime tracking, and modular decomposition—will emerge inevitably. This supports efforts in interpretability, capability benchmarking, and possibly AI alignment, as internal representations can be analyzed for decisional sufficiency and structural integrity.

Speculation on Future Directions

Future research may extend these selection theorems to richer task distributions, continuous domains, multi-agent settings, and hierarchical memory models. There is potential to leverage the theory both for designing architectures that systematically enforce desired internal structures and for empirically testing representational properties of advanced agents and biological systems.

Further, the explicit distinction between representation necessity and recovery suggests new avenues for understanding and diagnosing internal states in agents based on regret-based behavioral signatures. These results may inform philosophical and empirical approaches to emergent consciousness and agency in artificial systems, by formalizing which aspects of internal state become structurally mandatory under competent, uncertainty-sensitive operation.

Conclusion

The paper rigorously formalizes how robust, regret-bounded generalization under uncertainty compresses the admissible space of internal agent architectures, compelling belief-like predictive modeling, decision-sufficient memory, modular decomposition, and regime-sensitive internal variables. It separates the sufficiency and necessity of world modeling and establishes tractable quantitative constraints for both fully observed and partially observed settings. Selection theorems connect empirical competence to structural inevitability, offering a principled foundation for interpreting the emergence of internal representation and organization in increasingly capable artificial agents (2603.02491).

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

Overview

This paper asks a simple question with a deep answer: if you want a smart robot (or game-playing AI) to do well when it can’t be sure what will happen, what must it “know” on the inside? The authors show that if an agent reliably does well on a wide variety of prediction-and-action challenges, then it must carry a kind of internal “predictive map” of the world and a memory that keeps track of what it has seen so far. In short: to be robust and competent under uncertainty, an agent must become predictive.

Key objectives and questions

The paper focuses on five easy-to-understand questions:

If an agent performs well across many tasks where it must guess what will happen after it acts, does that force the agent to keep predictive information inside?
Can we show this even if the agent sometimes picks actions randomly (not always the same way), and even if it can’t fully see the true state of the world?
In fully visible worlds (where the agent sees everything), does good performance force it to learn “what happens next if I do action A in situation S”?
In partially visible worlds (where the agent only gets clues, not the full state), does good performance force it to keep a belief-like memory that separates situations that predict different futures?
How far can this internal knowledge go? Can the agent learn “what would have happened if…” (counterfactuals), or only “what happens if I do…” (interventions)?

Methods in everyday language

The authors turn complicated prediction problems into a very simple game: a series of yes/no bets.

Imagine a video game where, before you act, you must guess whether a certain future event will happen (like “Will I reach the goal more than k times if I try n times?”). You signal your guess with a one-bit “left” or “right” choice. Then the game unfolds, and you either guessed right or wrong.
The agent’s “regret” is how much worse it does than the best possible strategy for those bets on average. Low regret means the agent usually bets right.
The key idea: if the evaluation includes many bets that aren’t coin flips (there really is a right side to pick most of the time), then consistently getting them right means the agent must internally distinguish the situations that lead to different outcomes. That forces it to carry predictive information and, when needed, memory.
In fully observed worlds, this betting framework shows the agent must approximate the transition rules of the world: “From state S, if I do action A, how likely is next state S’?”
In partially observed worlds, the same logic shows the agent must keep a memory that separates histories that lead to different likely futures—even if the last observation looks the same. This looks like a belief state or a predictive-state representation.

In short: turn “predict the future under actions” into “make a bet.” If you avoid regret on lots of clear bets, you must be predicting well internally.

Main findings and why they matter

Here are the main takeaways:

Fully observed worlds: Good performance forces learning the “what-happens-when-I-do-X” rules.
- If the agent does well on many multi-step bets, it must internally approximate the true chances that one state leads to another when it takes certain actions. This is like learning the game’s transition table.
Partially observed worlds: Good performance forces belief-like memory.
- If the agent can’t see everything, two histories that look the same on the surface may still predict different futures. To keep regret low on the bets, the agent must keep memory that separates those histories when it matters. In other words, competent agents must carry belief-like or predictive state.
Interventions vs. counterfactuals: You can recover “what happens if I do A,” not “what would have happened if I had done B instead.”
- The paper shows you can force the agent to learn intervention answers (Level 2 in causal terms: “If I press this button, what happens?”).
- But without extra assumptions, you can’t force it to learn counterfactual couplings (Level 3: “Given I saw this outcome with action A, what would I have seen if I had chosen action B instead?”). That information is not determined just by getting the bets right.
Structured task families shape internal organization:
- Modular tasks push modular internal information: if tests are separated into blocks that check different parts of the world, then low regret requires an internal structure that can keep those parts apart when needed.
- Shifting mixtures (different regimes or modes) push regime-tracking: if tests come from different hidden “modes,” doing well requires a memory variable that tracks which mode you’re in.
- Convergent representations: If two different agents both achieve very low regret and don’t keep extra unnecessary distinctions, their internal “decision-relevant” states will match up, up to a relabeling. In other words, good agents’ useful internal representations tend to line up.
More general and practical than earlier work:
- The results hold even if the agent’s policy is stochastic (sometimes random), and they rely on average performance, not perfection in the worst case. This makes them more realistic for modern machine learning systems.

Implications and potential impact

For AI design and safety: If we want agents that are broadly competent under uncertainty, we should expect—and even test for—predictive internal models and belief-like memory. These are not optional add-ons; they are consequences of doing well on the right kinds of tasks.
For evaluation: Building test suites with clear, non-coin-flip bets about future outcomes under different actions will “select” for the right internal structure in learning agents.
For understanding advanced agents: As agents get better across varied tasks, they will tend to develop:
- Predictive world models,
- Memory that tracks what matters,
- Modularity when tasks are modular,
- Latent variables that track changing conditions (regimes).
For science and theory: This helps explain why different successful systems—human brains and artificial models—often end up with similar internal structures when faced with similar problem demands. It’s not an accident; it’s selected by the need to perform well under uncertainty.

Bottom line: If you want an agent that can handle the messy, uncertain real world, you inevitably get an agent that thinks ahead, keeps track of what it knows, and organizes its knowledge in sensible ways. Robust performance under uncertainty selects for predictive internal structure.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a single, concrete list of what the paper leaves unresolved, uncertain, or unexplored, to guide future research.

Extension beyond finite, stationary, communicating environments: formalizing and proving selection theorems for continuous state/action spaces, non-stationary dynamics, and non-communicating or weakly communicating MDPs/POMDPs.
Practical feasibility of composite goals in fully observed settings: how to construct and execute the required n attempts of (s, a) when visiting s is rare, costly, or unsafe, and how this affects regret bounds and identifiability.
Sensitivity to the evaluation distribution: replacing uniform sampling over S×A×S×{0,…,n} with realistic or task-driven distributions; characterizing how non-uniform sampling and skewed test coverage alter the bounds and the necessity claims.
Constructive design of test families in POMDPs: procedures to generate or learn witness sets Sγ(h, h′) that achieve sufficient margin and coverage; conditions guaranteeing their existence for arbitrary POMDPs.
Adaptive versus fixed tests: generalizing selection theorems to tests where action sequences α are chosen adaptively based on observations (closed-loop probing) rather than fixed sequences.
Sample complexity and query efficiency: bounding the number of tests/goals needed (and their horizon n) to achieve specified error/regret levels; optimizing D to minimize the number of queries for a given selection objective.
Robustness to measurement noise and evaluation errors: analyzing how noisy goal outcomes, stochastic evaluation protocols, or misspecified thresholds k in composite goals affect the estimator and regret bounds.
Tighter estimators and bounds: exploring alternative mappings from policy choice probabilities q to transition estimates (e.g., via proper scoring rules or likelihood-based estimators) and deriving sharper error/regret guarantees.
Lower bounds and impossibility results: establishing fundamental limits on recovery/necessity (e.g., minimax lower bounds) under average-case regret, especially in partial observability.
Extension to multi-class and continuous prediction goals: moving beyond binary bets to multi-outcome or continuous-valued predictive tasks and assessing whether analogous selection theorems hold.
Resource-constrained agents: incorporating computational/communication constraints (limited memory, time, precision) into the selection framework to quantify trade-offs between regret and representational complexity.
Training dynamics and learning algorithms: linking these necessity results to specific RL/representation learning objectives, exploration strategies, and optimization dynamics to determine when and how agents learn the required predictive state.
Role of stochastic policies beyond the report-bit device: characterizing identifiability and necessity when an agent cannot emit a separate report bit, or when policy stochasticity is structured (e.g., entropy-regularized) rather than arbitrary.
Non-degenerate evaluation assumptions: operational methods to verify or enforce the margin and coverage conditions (η, η′, qγ) in real domains; impact when these conditions are only approximately met.
Continuous-time and event-driven systems: extending results to semi-Markov, continuous-time, or asynchronous settings where the notion of attempt times Ti and horizons n needs reformulation.
Partial observability sufficiency/constructiveness: while memory necessity is shown, methods to recover or construct minimal decision-sufficient representations (belief or PSR) from regret-bounded behavior remain open.
Alias avoidance granularity: quantifying how fine-grained the no-aliasing requirement must be (as a function of γ and D), and how it translates into bounds on the number of memory states or representation dimension.
Identifiability of Pearl Level 3 counterfactuals: specifying structural assumptions (e.g., noise models, invariances, independence constraints) under which Level 3 queries become identifiable from agent behavior and tests, or proving stronger impossibility.
Causal content beyond transition kernels: conditions under which inter-variable causal relations (within state vector components) can be inferred from regret-bounded behavior, especially when transitions are stochastic and factored.
Mixture/regime tracking in practice: methods to detect, label, and track latent regimes from behavior alone; quantifying how much regime-switch evidence is required to force persistent modulators under average regret.
Modular task families: systematic design of block-structured test distributions and diagnostics that enforce specific modular decompositions; measuring emergent modularity and its dependence on D and γ.
Representation match scope: circumstances under which the invertible recoding result extends across agents trained on different but overlapping task families or across domains; robustness to partial support mismatch.
Dependence on horizon n: characterizing how quickly selection pressure strengthens with increasing n, and what minimal horizon is required for specific internal structures to become necessary.
Generalization under distribution shift: whether regret-bounded necessity persists when the evaluation distribution shifts or expands post-training; identifying invariants that remain selected under shift.
Safety and exploration constraints: integrating safety constraints, limited exploration budgets, or risk aversion into selection theorems to understand necessity under realistic operational constraints.
Empirical validation: designing controlled experiments (simulated and real) to test predicted emergence of predictive state, no-aliasing memory, modularity, and regime-tracking; quantifying alignment with brain-like representations as hypothesized.

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

Below are applications that can be deployed now, drawing directly from the paper’s selection theorems, betting reduction, and PSR-style test design.

Regret‑based world‑model audit harness (software, robotics, RL)
- What: Build a test harness that evaluates an agent’s average‑case regret on structured, action‑conditioned prediction tasks (“betting goals”). Use test families with nontrivial margins to diagnose whether the agent’s internal state must be predictive.
- Tool/Product/Workflow: A “Betting Goal Evaluator” that:
- Generates PSR‑style tests T=(α,W) and composite goals G.
- Instruments agents with a “report bit” channel B∈{L,R} (or logs a pre‑action commitment) and executes prescribed action sequences α.
- Computes normalized regret per test and aggregates over test distributions D and history distributions H.
- Flags agents with low average regret as necessarily implementing predictive internal state (Theorem: Predictive modeling necessity).
- Assumptions/Dependencies: Non‑degenerate evaluation distribution with margin γ>0 on a nontrivial fraction of tests; ability to run standardized action sequences; access to agent outputs or logs for the report bit; sufficient coverage of histories/tests.
Transition‑kernel estimation from behavior in fully observed tasks (robotics, industrial control, games/sim)
- What: Estimate interventional transition probabilities P(S′|S,A) from a stochastic, goal‑conditioned policy’s choices on composite goals of depth n, using the soft estimator defined in the paper.
- Tool/Product/Workflow: “World‑Model Estimator”:
- Construct composite goals G^{{(n)}_{s,a,s′,k},} collect agent choice probabilities q_{s,a,s′,k}.
- Compute a soft estimate via the paper’s formula:
- p̂{ss′}(a) = (1/n)·(Σ_k (1 − q{s,a,s′,k)) − 1/2)
- Confidence improves with n; expected estimation error bounded by a term that decreases as O(1/√n) plus a term proportional to average regret (Theorem: Fully observed bound).
- Assumptions/Dependencies: Fully observed, stationary, communicating environment; actions influence transitions; nontrivial margins; ability to stage n repeated diagnostic attempts; average regret bound over the diagnostic family.
Belief‑like memory detection via no‑aliasing tests (software, autonomy, RL safety)
- What: Detect whether an agent’s memory collapses distinct predictive histories (aliasing) by comparing regret on paired histories with the same last observation but opposite large‑margin bets.
- Tool/Product/Workflow: “No‑Aliasing Audit”:
- Sample paired histories (h,h′) with identical last observation.
- Construct witness test sets S_γ(h,h′) such that bets flip confidently across the pair.
- Compute pair‑averaged regret; apply the lower bound linking regret and aliasing (Theorem: Memory necessity).
- Report minimum necessary refinement of memory; flag architectures that alias decision‑relevant history.
- Assumptions/Dependencies: Availability of paired histories and witness tests with margin γ; ability to condition tests on action sequences and measure outcomes; stationary evaluation for repeatability.
Curriculum and benchmark design that selects modularity and regime tracking (ML training, MLOps)
- What: Use block‑structured test families and mixtures of regimes to induce, via selection pressure, informational modularity and regime‑sensitive internal state.
- Tool/Product/Workflow:
- “Block‑Structured Test Generator”: partitions tests into K disjoint blocks; competency on each block pushes the agent to avoid aliasing within each module (Corollary: Modularity).
- “Regime Mixture Evaluator”: randomly shifts regimes I with different test distributions D_I; low regret requires memory to track these latent regimes whenever they flip high‑margin decisions (Corollary: Mixtures).
- Assumptions/Dependencies: Ability to define and sample block partitions and regimes; existence of γ‑margin witness tests under each block/regime; stable evaluation protocols.
Representation comparison and invertible recoding between agents (model distillation, transfer, interpretability)
- What: If two agents achieve vanishing regret under the same test family and are γ‑minimal/completed with respect to decision‑relevant partitions, their memory representations agree up to an invertible relabeling (Corollary: Representational match).
- Tool/Product/Workflow: “Decision‑Relevant Partition Mapper”:
- Evaluate two agents on the same test distribution.
- Verify γ‑minimality and completeness conditions.
- Learn invertible maps φ,ψ between memory codes to align models, aiding distillation/transfer and interoperability.
- Assumptions/Dependencies: Shared evaluation distribution; agents reach low pair‑regret; satisfaction of γ‑minimality and completeness; access to memory readouts or proxies.
Governance and certification protocols based on average‑case regret (policy, compliance, AI safety)
- What: Create standardized, non‑degenerate test suites and certify agents whose average regret falls below thresholds linked to predictive modeling and memory necessity.
- Tool/Product/Workflow: “Selection‑Theorem‑Aligned Certification”:
- Define test families with large‑margin bets; publish average regret thresholds for different application levels.
- Certify compliance claims: predictive state necessity, no‑aliasing guarantees, regime‑tracking sensitivity.
- Assumptions/Dependencies: Sector‑specific test design; agreement on margins γ and acceptable regret levels; practical approximations to V* (optimal success probability) via baselines or bounds; reproducible evaluation.
Reliability checks for consumer AI assistants (daily life, software)
- What: Use betting‑style diagnostics to ensure assistants maintain predictive, regime‑sensitive memory (e.g., tracking context shifts across calendars, locations, or user preferences).
- Tool/Product/Workflow: Lightweight “Context Bet” probes integrated into QA/validation pipelines that test confident predictions under action sequences (e.g., reminders, scheduling steps) and penalize aliasing.
- Assumptions/Dependencies: Ability to simulate or log action sequences; privacy‑aware history pairing; sufficient margin tests tied to real workflows.

Long‑Term Applications

These applications require further research, scaling, sector‑specific development, or stronger assumptions than those used in the paper.

Sector‑wide standardized selection‑theorem benchmarks (policy, standards, safety)
- What: Mature, cross‑industry benchmarks for robust agency that enforce predictive state, memory sufficiency, modularity, and regime tracking.
- Potential Products: NIST‑like “Robust Agency Benchmark Suite” with graded certifications and scenario libraries for autonomy, healthcare, finance, and energy.
- Dependencies: Community consensus on test design; domain‑specific margins γ; tooling for repeatable evaluation in real systems; legal and compliance frameworks.
Architectures that explicitly optimize predictive state under partial observability (robotics, autonomous vehicles, industrial IoT)
- What: PSR‑guided memory modules and training objectives that directly minimize average regret on informative test families, guaranteeing belief‑like internal state.
- Potential Products: “PSR Memory Layer” for RL agents, integrated into popular frameworks; training recipes that couple regime‑mixture curricula to memory shaping.
- Dependencies: Scalable generation of tests in complex environments; robust optimization under partial observability; deployment pipelines that preserve evaluation distribution properties.
Counterfactual (Level 3) causal inference for agents (healthcare, policy analysis, scientific discovery)
- What: Extend beyond Level 2 interventional kernel recovery toward counterfactuals by combining the paper’s behavioral selection tools with explicit structural causal models and additional assumptions (e.g., exogenous noise coupling).
- Potential Products: “Counterfactual Coupling Learner” for agent behavior; compliance tools for medical decision support that validate counterfactual sensitivity.
- Dependencies: Instrumentation to learn or stipulate SCM structure; identifiability conditions; data sharing and governance; stronger assumptions than those in the paper (which shows Level 3 is not recoverable from interventions alone).
Affective/regime modulators for generalist agents (assistants, robotics, operations)
- What: Design and evaluate persistent internal variables that track latent regimes (workload, risk, user goals) and globally modulate policy, attention, and learning—normatively selected by mixture‑of‑regimes competence requirements.
- Potential Products: “Regime Controller” modules; dashboards showing regime‑sensitive memory states affecting action selection.
- Dependencies: Reliable detection of latent regimes; human‑in‑the‑loop validation; safety constraints for modulation dynamics; domain adaptation.
Interoperability standards for decision‑relevant partitions (MLOps, multi‑agent systems)
- What: Create an interchange format for memory states representing γ‑margin decision partitions, enabling agents to align representations up to invertible recoding.
- Potential Products: “Decision‑Partition API” and “State Relabeling Layer” to compose heterogeneous agents or port competencies across models.
- Dependencies: Shared evaluation suites; access to memory representations or surrogates; privacy/security for state exchange; performance guarantees under composition.
High‑stakes decision support under uncertainty (healthcare, energy, finance, supply chain)
- What: Deploy regret‑based diagnostics to validate predictive memory and regime tracking before deployment in partial observability settings (e.g., EHR‑driven triage, grid operations with demand regimes, trading under market regimes).
- Potential Products: Sector‑specific validation harnesses; pre‑deployment audit pipelines; ongoing monitoring with regime‑aware probes.
- Dependencies: Domain‑specific tests with clear margins; safe simulation environments; robust estimation of optimal baselines; regulatory compliance and data governance.
Education: regime‑aware, predictive‑state AI tutors
- What: Use test families to ensure tutors maintain belief‑like student models and adapt across latent regimes (topic difficulty, motivation).
- Potential Products: “Tutor Audit Suite” with predictive tests and aliasing checks; curriculum generators enforcing modularity across subject blocks.
- Dependencies: Privacy‑respecting history construction; pedagogical alignment of test families; validated regret thresholds for educational outcomes.
Autonomous driving and embodied AI: end‑to‑end evaluation at scale
- What: Integrate selection‑theorem diagnostics into large‑scale simulation and fleet telemetry to verify predictive state and no‑aliasing under varied conditions.
- Potential Products: “Fleet‑Regret Dashboard” with PSR‑style probes; continuous‑evaluation protocols.
- Dependencies: High‑fidelity simulation; representative regimes; scalable data collection; handling nonstationarity and distribution shift.

Cross‑cutting assumptions and dependencies

Non‑degenerate evaluation distributions: a nontrivial fraction of tests must have margin γ>0; otherwise, constant or trivial policies can pass.
Average‑case regret measurement: requires many trials across diverse histories/tests; estimates of V* may need baselines or bounds when exact optimality is unknown.
Environment properties (for fully observed recovery): stationarity, communicating dynamics, and action influence; ability to stage repeated diagnostic attempts of depth n.
Instrumentation: access to a “report bit” (or equivalent pre‑action commitment) that does not affect dynamics; logging of action sequences and outcomes.
Partial observability: witness test construction and history pairing are necessary to detect aliasing; guarantees are selection‑style (necessity) rather than full model recovery.
Causality levels: the paper guarantees Level 2 interventional recovery under stated assumptions; Level 3 counterfactuals require additional structural information.

View Paper Prompt View All Prompts

Glossary

Action-conditioned prediction tasks: Prediction objectives that depend on the actions the agent takes. "low average-case regret on structured families of action-conditioned prediction tasks forces an agent to implement predictive, structured internal state."
Aliasing: A memory representation collapsing distinct histories into the same internal state, potentially causing errors. "Define the aliasing event for M:"
Average-case regret: Expected (distribution-averaged) shortfall from optimal performance over a task family. "we assume only average-case regret rather than worst-case optimality;"
Behavioral distinguishability: The property that two histories can be distinguished by a test yielding different success probabilities. "Two histories h,h' are behaviorally distinguishable"
Belief-like memory: Internal memory that encodes belief-style predictive distinctions sufficient for confident decisions. "yields quantitative no-aliasing bounds for belief-like memory"
Belief states: Distributions over latent states that summarize history and suffice for optimal control. "optimal control can be implemented using belief states or world models"
Betting goal: A one-shot binary decision (bet) on whether an event will occur under a prescribed action sequence. "Each test T=(α,W) induces a one-shot betting goal g_T:"
Block-structured tests: Partitioned test families that impose modular informational demands. "Block-structured tests select for informational modularity"
Causal Markov-process (cMP): An interpretation of controlled Markov processes where actions correspond to causal interventions. "causal Markov-process (cMP) interpretation"
Communicating environment: An environment where any state can reach any other via some finite action sequence with positive probability. "we assume the environment is communicating, meaning that for any s,s'∈S there exists a finite action sequence that reaches s' from s with positive probability"
Composite goal family: A constructed family of goals combining a binary commitment with counted successes across multiple attempts. "The composite goal G^{{(n)}_{s,a,s',k}} is the event:"
Counterfactual couplings: The joint dependence between potential outcomes under different interventions, not identified from interventional kernels alone. "while differing in counterfactual couplings."
Diagnostic goal family: A subset of evaluation goals informative enough to constrain or recover aspects of the world model. "low average regret on the diagnostic goal family forces π to implicitly approximate Level 2 interventional queries"
Do-intervention: Pearl’s do-operator specifying an external intervention on an action variable. "corresponds to the intervention \mathrm{do}(A_t=a)"
Elicitation: The design of mechanisms that encourage truthful probability reports, often linked to scoring rules. "related to elicitation and proper scoring rules"
Evaluation distribution: The distribution over histories and tests used to measure average regret and competence. "When the evaluation distribution places nontrivial mass on large-margin tests"
Goal-conditioned policy: A policy whose choices depend on the specific goal being evaluated. "Let π be a (possibly stochastic) goal-conditioned policy."
Informational modularity: Internal organization that separates information according to task blocks or modules. "Block-structured tests select for informational modularity"
Internal Model Principle: A control-theoretic result asserting that effective regulation requires modeling the system. "formalized in linear control by the Internal Model Principle"
Interventional kernel: The distribution of next states under a do-intervention on the action. "Causal content: approximately recovered interventional kernel"
Interventional queries: Queries about distributions under interventions (do-operations) rather than mere observations. "Level 2 interventional queries"
Level 2 interventions: Pearl’s hierarchy level concerning interventional distributions identifiable from do-operations. "Level 2 interventions are recoverable"
Level 3 counterfactuals: Pearl’s hierarchy level concerning joint counterfactuals across different interventions, generally not identifiable here. "Level 3 counterfactuals are not"
Margin-style regret decomposition: An analysis expressing regret in terms of decision margin and mass on the wrong action. "instantiates a standard margin-style regret decomposition"
Mixtures of regimes: Evaluation mixtures drawn from different latent regimes, selecting for regime-sensitive internal state. "mixtures of regimes select for regime-sensitive internal state"
No-aliasing bounds: Quantitative guarantees that memory cannot collapse histories that require different confident predictions. "yields quantitative no-aliasing bounds for belief-like memory"
No-regret guarantees: Conditions ensuring algorithms avoid systematic loss, constraining required information. "No-regret guarantees constrain the information needed to avoid systematic loss"
Normalized regret: Regret scaled by optimal value, enabling comparisons across tasks. "Define the normalized regret as"
Partial observability: Settings where the agent receives observations that do not reveal the full state, requiring memory or belief. "partial observability, and evaluation under task distributions"
POMDP: A partially observed Markov decision process formalizing decision-making with latent states and noisy observations. "standard POMDP setting"
Predictive partition: The partition of histories induced by tests that yield distinct predictive probabilities. "refine the predictive partition induced by those tests"
Predictive state: The vector of test success probabilities summarizing decision-relevant predictions. "is the predictive state."
Predictive world model: An internal mechanism sufficient to compute action-conditioned test probabilities in a POMDP. "Predictive world model."
Predictive-state representations (PSRs): State representations defined by predictions of action-conditioned futures rather than latent variables. "predictive-state representations (PSRs)"
Proper scoring rules: Functions that incentivize truthful probability reporting in predictions and bets. "related to elicitation and proper scoring rules"
Regime-sensitive internal state: Memory or representation that changes with latent evaluative regimes to preserve correct decisions. "regime-sensitive internal state"
Regime-tracking variables: Persistent internal variables that track latent conditions influencing policy across tasks. "regime-tracking variables that resemble modulation mechanisms studied in affective neuroscience"
Representational match: The equivalence of different agents’ memory representations up to invertible recoding under the same evaluation. "Representational match under γ-minimality (up to invertible recoding)"
Selection theorems: Results showing that competence guarantees imply necessary internal predictive and memory structure. "We prove quantitative ``selection theorems''"
Soft estimator: An estimator formed from soft (probabilistic) choices rather than hard counts to infer transition probabilities. "Define the following (soft) estimator of the transition probabilities from π:"
Sufficient statistic: A summary of history that is decision-sufficient for optimal control. "function of a sufficient statistic."
Transition kernel: The probability function specifying next-state transitions given the current state and action. "T(x'\mid x,a) is the transition kernel"
Vanishing-regret agents: Agents achieving regret approaching zero under evaluation, leading to representational agreement. "any two vanishing-regret agents must agree"
World-model recovery: Inferring the environment’s transition or interventional dynamics from agent behavior and goals. "World model recovery in fully observed environments"
Wrong-action mass: The probability mass assigned to the suboptimal choice in a binary decision. "Define the wrong-action mass"

What Capable Agents Must Know: Selection Theorems for Robust Decision-Making under Uncertainty

Summary

Selection Theorems for Internal Structure in Robust Decision-Making under Uncertainty

Motivation and Problem Statement

Conceptual Framework and Technical Approach

Key Results and Theorems

Fully Observed Environments: Transition Kernel Recovery

Partial Observability: Predictive-State and Memory Necessity

Structuring Internal Organization from Task Family

Practical and Theoretical Implications

Speculation on Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

Overview

Key objectives and questions

Methods in everyday language

Main findings and why they matter

Implications and potential impact

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long‑Term Applications

Cross‑cutting assumptions and dependencies

Glossary

Open Problems

Continue Learning

Collections

Tweets