Thermodynamic Measure of Intelligence
Abstract: Can intelligence be measured? We propose that intelligence can be defined as the lawful amplification of rare but valid futures: a system increases the probability of outcomes that would be unlikely under passive dynamics but remain admissible under the constraints of the domain. We start with the premise that an intelligent system must model the world and its own place within it. Because the system is part of the world it models, this leads naturally to recursive self-simulation: the system represents futures in which its own actions are part of the trajectory. Our central results give a necessity statement and a conditional near-sufficiency statement connecting this architecture to a precise thermodynamic measure of lawful amplification of rare-valid futures: high rare-valid lift is impossible unless the internal simulation identifies rare-valid futures with high fidelity; conversely, when rare-valid fidelity is high and the simulation contains an effective policy, the achievable lift approaches the actuation-limited optimum. Thus recursive self-simulation is not merely a plausible feature of intelligence but, under the stated assumptions, is necessary and nearly sufficient for high thermodynamic intelligence. The resulting framework makes intelligence measurable on a universal scale, from passive matter and feedback controllers, LLMs, and humans as text generators to Maxwell-demon-like information engines.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What Is This Paper About?
This paper asks a big question: Can we measure intelligence in a way that works for anything—machines, people, even physical “information engines” like Maxwell’s demon? The author suggests yes. Intelligence is defined as the ability to make unlikely-but-still-possible good outcomes happen more often. In short: an intelligent system shifts the future toward rare, valid successes.
To do that well, the system needs to imagine possible futures, include itself in those futures, and choose actions that make the good ones more likely. The paper turns this idea into a precise, physics-friendly number that can be compared across very different systems.
What Questions Does the Paper Try to Answer?
- How can we define intelligence in a way that doesn’t depend on the kind of system (brain, robot, AI model, or even a physics thought experiment)?
- Can we turn that definition into a number we can calculate and compare?
- What kind of “thinking” or internal modeling is necessary to score highly on such a measure?
- How does this connect to the rules of physics, especially energy, entropy, and information?
How Do They Approach It? (Explained Simply)
Think of the world like a board game that keeps playing on its own (the “baseline” way things happen if you don’t intervene). Now imagine an agent—like a person, robot, or program—who can make moves to steer the game. The paper focuses on:
- Passive dynamics: what happens “by default” if no one intervenes.
- Rare-valid futures: outcomes that are unlikely under the default but still allowed and “correct” (valid). For example, writing a flawless, creative poem is unlikely if you randomly press keys—but it’s still possible and valid.
- Self-simulation: the agent imagines future paths of the world, including its own future actions and how those would change what happens. This means thinking ahead while keeping itself “inside the picture.”
They define a score called rare-valid lift. In simple terms:
- Pick a target set of good outcomes that are rare under the baseline, with baseline chance δ (delta).
- Let the agent act, producing a new chance for those outcomes.
- The lift is the fractional increase: how much the agent raised the odds compared to the baseline, divided by the baseline.
Mathematically:
- Baseline probability of the rare-valid set:
- After the agent acts:
- Rare-valid lift:
If , the agent didn’t help at all. If is big, the agent made rare successes much more likely.
They also measure how well the agent’s inner simulation spots the right rare-valid futures (its “fidelity”). High fidelity means the agent correctly aims at the truly good, rare targets instead of wasting effort on wrong ones.
Finally, they link all of this to thermodynamics (the physics of energy and entropy). The key idea: you can’t boost the chances of rare outcomes for free. Changing the odds costs information and energy, and the math of that cost shows up in their formulas.
What Did They Find?
- You can’t get high lift without accurate self-simulation.
- Necessity: If your inner model can’t correctly identify the rare-valid futures, you can’t significantly increase their odds—even if you act strongly. In other words, power without aim doesn’t make you “intelligent” by this measure.
- This is formalized in a theorem: poor rare-valid fidelity puts a ceiling on how high your lift can go.
- If you aim well and can act effectively, you can get near the best possible lift.
- Near-sufficiency: With high-fidelity identification and an effective policy (a good way to act), your lift approaches a limit set by how strongly you can act. So “good aim + good action” nearly guarantees high measured intelligence.
- Boosting rare-valid outcomes requires “paying” in information/energy terms.
- They show that increasing the probability of rare-valid futures forces your overall path-of-the-world distribution to move away from the baseline (this shows up as a KL divergence—think “distance” between probability laws). In plain terms: making rare good things more likely requires real work in an information sense.
- Mistakes (false positives) cost you.
- If your model wrongly marks some rare outcomes as “valid” when they aren’t, you waste effort and pay an extra thermodynamic cost to detect/correct/erase those errors. The paper provides formulas for this penalty, including a classic “Landauer” cost for erasing information.
- As your self-simulation improves, the false-positive penalty shrinks.
- The framework works across very different systems.
- Maxwell’s demon (a thought experiment where a tiny “agent” sorts fast and slow gas particles) represents an extreme, high-lift case when the demon’s model and actions are nearly perfect.
- For symbolic systems like humans or LLMs as text generators, you can define a baseline and ask: how much does the generator increase the odds of rare, valid sequences (correct, meaningful, task-relevant text) compared to that baseline?
Why this matters: These results say intelligence—as “making rare-valid futures more likely”—depends on good internal models and real physical costs. It’s not just randomness or brute force.
Why Are These Results Important?
- A single, universal scale: The same measure can be used for a thermostat, a chess engine, a human writer, an LLM, a bacterial colony, or a hypothetical demon. That makes comparisons fairer across very different domains.
- Clarity about “what intelligence does”: Instead of focusing on tasks or rewards only, this focuses on the physical/probabilistic effect of intelligence—shifting the future toward unlikely but valid wins.
- Design guidance: To build smarter systems, improve their self-simulation fidelity (accurately predict which rare futures are both possible and good) and their ability to act on that knowledge.
- Honesty about costs: Intelligence has information and energy footprints. The more you change the odds, the more you must “pay” in thermodynamic terms.
What Could This Change Going Forward?
- Measuring progress: Researchers and engineers could evaluate systems by how much they increase the chances of rare, valid successes (at a stated resolution and baseline), not just by task-specific scores.
- Smarter planning and safety: Systems that clearly model themselves “inside the world” and aim precisely at valid futures are more effective and potentially safer, because they waste less effort on invalid paths.
- Cross-disciplinary bridges: This connects AI, control theory, information theory, and physics. It may inspire new ways to design intelligent controllers and to understand biological intelligence.
- Better benchmarks: For LLMs or robots, we could craft baselines and rare-valid targets that reveal whether a system truly makes the unlikely-but-correct happen more often—and not by luck.
In one sentence: Intelligence, in this view, is the lawful and energy-aware power to make good, unlikely futures happen—by thinking ahead about the world and oneself, aiming accurately, and acting effectively.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a single, consolidated list of concrete gaps and open questions that the paper leaves unresolved, intended to guide future research.
- Baseline law selection: How to choose, justify, and empirically estimate the passive baseline distribution P₀ in diverse domains (physical, biological, symbolic) without circularity, and how sensitive the measure is to different plausible baselines.
- Validity definition: How to operationalize and audit the “valid” set V across domains (e.g., physical admissibility, semantic coherence, executable correctness), including procedures, annotators/validators, and inter-rater reliability.
- Resolution dependence: How to select the observational partition/resolution η, quantify sensitivity of Iδ to η, and develop multi-resolution or resolution-invariant comparisons.
- Limit existence and scaling: Conditions under which the limit I = lim_{δ→0⁺} Iδ exists, how to compare systems when the limit does not exist, and how to summarize across a δ-schedule in practice.
- Zero-probability but valid events: How to handle support mismatch where valid trajectories have P₀-measure zero (violating absolute continuity assumptions); whether to use smoothing or alternative lift definitions.
- Finite-sample estimation: Practical estimators for Iδ and associated uncertainty under data scarcity (rare-event regime), including variance control, confidence intervals, and sample complexity.
- Rare-event methodology: Concrete algorithms (e.g., importance sampling, splitting, cross-entropy methods) to efficiently estimate probability mass in Vδ and its lift in high-dimensional trajectory spaces.
- Measurement protocols: End-to-end experimental designs to measure thermodynamic intelligence in real systems (controllers, biological agents, LLMs/humans), including how to collect enough trajectories for small δ.
- Full thermodynamic accounting: A complete energy-information balance (measurement, memory, computation, actuation, erasure) linked to Iδ, beyond the coarse-grained bookkeeping; how to estimate or bound these costs empirically.
- Deriving actuation limits: Methods to translate physical/algorithmic resource constraints into amplification bounds (α_min, α_max) used in the theorems; how to connect these to control authority or power budgets.
- Coarse-grained entropy signatures: How to relate the event-level correction J_s(B) and Δ_s bounds to experimentally accessible thermodynamic quantities and verify them in controlled studies.
- Tightness and achievability: Are the fidelity–lift bounds (necessity and near-sufficiency) tight? Under what conditions are they achievable, and what gap remains in realistic systems?
- Model-to-control stability: Concrete, verifiable conditions under which D_KL(P_B{(k)} || P_B*) ≤ C ρ(ε_k) holds; how to estimate C and ρ, and how violations impact measured intelligence.
- Rare-set stability: General conditions ensuring p_err{(k)} ≤ ρ_δ(ε_k) (beyond the appendix’s sufficient condition), and procedures to verify boundary margins for rare-valid sets in practice.
- Non-conservative identification: Treatment of false negatives (V’δ ⊂ Vδ) and mixed error patterns; how to optimally allocate amplification under misclassification trade-offs and derive the full error–cost frontier.
- Causal identification: How to causally attribute lift to the system rather than confounders (e.g., environment drift, other agents); experimental interventions or counterfactual designs to isolate P vs P₀ changes induced by the agent.
- Time horizon and nonstationarity: Dependence of Iδ on trajectory length T, handling nonstationary P₀ and evolving agents (learning), and protocols for online or time-localized intelligence measurement.
- Cross-system comparability: Standards to make Iδ comparable across tasks, domains, and substrates; guidelines for choosing δ, η, and V to ensure fair comparisons and avoid benchmark gaming.
- Goodharting and adversarial validity: Defenses against agents inflating Iδ by exploiting flaws in the validity test (especially in symbolic domains); robust validators and adversarial testing.
- Safety and externalities: Incorporating safety/ethical constraints—amplifying rare-valid futures may be harmful; how to penalize or constrain lift that yields undesirable side effects not captured by V.
- Negative lift interpretation: How to interpret and report Iδ < 0 (de-amplification of rare-valid futures), including whether this corresponds to “anti-intelligence,” risk aversion, or other phenomena.
- Links to standard AI metrics: Formal relationships between rare-valid lift and reward maximization, regret, generalization, or compression; when does higher Iδ imply better task performance?
- Algorithmic tools: Practical methods to learn/approximate rare-valid sets from models and to compute policies that maximize lift subject to costs, including scalable surrogates for high-dimensional control.
- Maxwell-demon realizability: Realistic, experimentally grounded protocols (with friction, back-action, finite measurement precision) that approach high lift; quantitative costs that bound achievable I in practice.
- Symbolic case concreteness: A reproducible pipeline for LLM/human text—what is P₀ for text, how V is defined (grammar, semantics, factuality), what n and δ are used, and how to report robust numbers (promised calibrations are not provided).
- Support mismatch in language: How to treat out-of-vocabulary or novel constructs where P₀ assigns zero or near-zero probability, and how to calibrate validity in open-ended semantics.
- δ-schedule and compressed scale: Formal properties, calibration, and interpretability of the proposed compressed scale Λ = log₁₀(log₁₀(I+1)+1); how to choose δ consistently across cases and report Λ with uncertainty.
- Partition dependence: Quantifying how much Iδ varies across different partitions of trajectory space and developing partition-robust or invariant statistics.
- Multi-agent and collective intelligence: Formalizing nested/interactive self-models, decomposing lift across agents, and attribution in collectives (credit assignment, synergy vs interference).
- Cost–benefit trade-offs: Integrating lift with costs (energy, time, risk) to form an efficiency metric rather than a pure amplification score; Pareto analyses across lift and cost dimensions.
- Coverage vs tail-seeking: When intelligent behavior should prioritize stabilizing common valid futures instead of tail amplification; conditions under which rare-valid lift is a sufficient proxy for “intelligence.”
- Continuous-time dynamics: Extensions to SDEs and deterministic chaotic systems with explicit Girsanov-type formulations, including practical computation of RN derivatives and lift in continuous domains.
- Optimization granularity: Allowing spatially varying amplification factors rather than constant α on V_δ; optimal shaping of dP/dP₀ across the boundary to maximize lift per unit cost.
- Decomposition of capabilities: Methods to disentangle perception, memory, planning, and actuation contributions to lift, enabling capability diagnosis and targeted improvements.
- Benchmarking and reproducibility: Public benchmarks with defined P₀, V, η, δ, data collection protocols, and scoring baselines to enable cross-paper comparisons, ablations, and progress tracking.
Practical Applications
Immediate Applications
The following applications can be deployed with current methods and data, provided baselines, validity criteria, and finite-resolution partitions are specified.
- Application: Rare-valid lift benchmarking for AI models (LLMs, code generators, theorem provers)
- Sectors: Software, education, research
- Tools/products/workflows:
- An evaluation harness that defines a baseline generator (e.g., n-gram, smaller model, or pre-instruction-tuned model), a validity oracle for tasks (syntax checks, unit tests, proof checkers), a finite partition , and a target rare-valid set ; compute by Monte Carlo
- Reporting dashboards for curves across , and a compressed scale for cross-model comparability
- Assumptions/dependencies:
- Clear choice of and validity tests; enough samples to estimate tail probabilities; careful choice of resolution and to limit variance; guardrails against Goodharting (optimizing to the metric rather than validity)
- Application: Training objective shaping for rare-valid output
- Sectors: Software/AI
- Tools/products/workflows:
- Reinforcement learning or rejection sampling to upweight candidate outputs in while tracking false-positive penalty (per imperfect identification accounting)
- Incorporate rare-valid fidelity proxy (precision of vs ) into loss to increase and reduce
- Assumptions/dependencies:
- Availability of automated validators or human-in-the-loop adjudication; calibration of to maintain learnability
- Application: Safety and robustness evaluation for autonomous robots and vehicles
- Sectors: Robotics, automotive, aerospace
- Tools/products/workflows:
- Simulation-based estimation of for rare but valid success states under extreme conditions (e.g., sudden obstacles, sensor dropouts); from passive or rule-based controller
- Pre-release signoff that exceeds threshold at specified stress scenarios
- Assumptions/dependencies:
- High-fidelity simulators; clear validity criteria for “success” and safety; environment shift coverage
- Application: Industrial control and process optimization under extremes
- Sectors: Manufacturing, energy, process control
- Tools/products/workflows:
- Controller A/B testing using path-law shift: estimate for maintaining quality/SLOs under disturbances (temperature spikes, supply variance)
- Use Theorem 1 (necessity) to prioritize model components that raise rare-valid fidelity over brute-force actuation
- Assumptions/dependencies:
- Sufficient logging to learn ; small-signal approximations to keep bins nondegenerate in thermodynamic audits
- Application: SRE/AIOps incident response effectiveness
- Sectors: Software operations
- Tools/products/workflows:
- Define as “meeting SLOs under incident class X”; from historical passive response; compute for proposed runbooks/automation
- Assumptions/dependencies:
- Accurate incident labeling and replay; validity aligned with user impact metrics
- Application: Clinical decision support for rare event management
- Sectors: Healthcare
- Tools/products/workflows:
- Evaluate triage or alerting systems by their lift for early detection of rare but actionable conditions (e.g., sepsis); is standard-of-care
- Assumptions/dependencies:
- Strict validity definitions (clinical guidelines); bias and fairness audits; IRB/ethics oversight
- Application: Risk management and trading under stress
- Sectors: Finance
- Tools/products/workflows:
- Backtest policies on tail scenarios; define as drawdown-limited, capital-conserving trajectories; from historical passive benchmarks
- Penalize false positives via bookkeeping to avoid spurious “tail wins”
- Assumptions/dependencies:
- Nonstationarity handling; robust scenario generation; regulatory compliance
- Application: Policy benchmarking of AI systems
- Sectors: Policy, standards
- Tools/products/workflows:
- Prototype “Thermodynamic Intelligence Benchmark (TIB)” profiles: publish across tasks with declared , , ; include error-penalty accounting
- Assumptions/dependencies:
- Community consensus on baselines and validity; transparency about evaluation context and sampling error
- Application: Laboratory audits of feedback devices via coarse-grained entropy-bins
- Sectors: Physics, materials, nano-systems
- Tools/products/workflows:
- Measure changes in entropy-production log-ratios across matched bins; compare with path-divergence bounds (Theorem: path-deviation bound) to validate feedback control claims
- Assumptions/dependencies:
- Nondegenerate bins; reliable measurement of trajectories; physical cost accounting remains distinct
- Application: Digital twin planning with rare-valid targeting
- Sectors: Manufacturing, logistics, smart buildings
- Tools/products/workflows:
- Use the twin to simulate (e.g., on-time delivery under storms) and select policies that maximize ; report as model sufficiency metric
- Assumptions/dependencies:
- Twin fidelity for intervention-relevant features (Assumption: model-to-control stability); scenario coverage
- Application: Education and competition design
- Sectors: Education
- Tools/products/workflows:
- Problem sets where encodes rare solution paths; score learners or agents by lift beyond baseline heuristics
- Assumptions/dependencies:
- Fair baselines (e.g., common heuristics); validity via auto-graders or rubrics
- Application: Human-in-the-loop workflow tuning
- Sectors: Customer support, content moderation
- Tools/products/workflows:
- Measure how tools + SOPs lift probability of rare-valid resolutions (e.g., de-escalations) relative to (historical logs)
- Assumptions/dependencies:
- Stable logging; careful definition of “valid” to avoid perverse incentives
Long-Term Applications
These applications require further research, scaling, or development of instrumentation, theory, or engineering.
- Application: Universal cross-domain intelligence scale adoption
- Sectors: AI, biology, neuroscience, policy
- Tools/products/workflows:
- A standardized “Intelligence Meter” reporting and across agents (LLMs, robots, microbial collectives, humans-as-text-generators), with domain-specific , ,
- Assumptions/dependencies:
- Broad agreement on baselines and validity; robust small- estimators; governance to prevent metric gaming
- Application: Recursive self-simulation architectures with guaranteed rare-valid fidelity
- Sectors: AI, robotics
- Tools/products/workflows:
- Model hierarchies explicitly optimizing and enforcing sufficiency conditions (Theorem: near-sufficiency); planners that simulate self-updates and other agents
- Assumptions/dependencies:
- Scalable self-modeling; verifiable policy classes; stability of rare-set boundaries under model refinement
- Application: Nanoscale information engines and molecular robots
- Sectors: Nanotechnology, materials, biophysics
- Tools/products/workflows:
- Devices that approach demon-like “lawful amplification” in micro-environments; design by maximizing per joule while tracking full thermodynamic costs
- Assumptions/dependencies:
- Advanced sensing/actuation at molecular scales; precise bookkeeping of measurement/erasure; noise-robust control
- Application: City- and grid-scale rare-event resilience controllers
- Sectors: Energy, transportation, civil infrastructure
- Tools/products/workflows:
- Controllers that increase probability of valid stability trajectories during extreme weather or cascading faults; continuous monitoring from telemetry
- Assumptions/dependencies:
- High-fidelity digital twins; secure data infrastructure; regulatory alignment; fail-safe mechanisms
- Application: Autonomous clinical AI with patient-specific digital twins
- Sectors: Healthcare
- Tools/products/workflows:
- Personalized recursive self-simulation to select interventions that amplify rare-valid outcomes (avoid adverse events) while computing false-positive overhead and thermodynamic cost bounds
- Assumptions/dependencies:
- Reliable causal models; real-time data access; strong clinical governance and interpretability
- Application: Governance frameworks linking capability to energy/information accounting
- Sectors: Policy, sustainability
- Tools/products/workflows:
- Licensing regimes or reporting standards requiring profiles and path-divergence/entropy-bins audits for safety-critical AI
- Assumptions/dependencies:
- Agreed-upon protocols for binning and divergence thresholds; enforceability; standardized disclosures
- Application: Black-swan mitigation platforms
- Sectors: Finance, supply chain, public safety
- Tools/products/workflows:
- Systems that explicitly optimize rare-valid lift for continuity under extreme contingencies; cross-sector digital twin networks
- Assumptions/dependencies:
- Data sharing across silos; reliable stress scenario modeling; integrated control rights
- Application: Household and personal digital assistants with self-models
- Sectors: Consumer software, IoT
- Tools/products/workflows:
- Assistants that recursively simulate user behavior and environment to amplify rare-valid goals (e.g., adherence to health plans) while minimizing false positives
- Assumptions/dependencies:
- Privacy-preserving modeling; smooth human-AI interfaces; validation of “valid outcomes” definitions
- Application: Rare-valid curriculum design for human learning
- Sectors: Education, workforce training
- Tools/products/workflows:
- Tools that identify and train for rare but valid strategies; track improvement as skill matures
- Assumptions/dependencies:
- Validity aligned with pedagogical goals; longitudinal data; fairness considerations
- Application: Cross-agent social modeling (theory of mind) in multi-agent systems
- Sectors: Robotics, gaming, defense
- Tools/products/workflows:
- Controllers that embed to anticipate others and achieve high in competitive/cooperative settings
- Assumptions/dependencies:
- Reliable opponent/ally modeling; sample-efficient learning in rare regimes; safety constraints
- Application: Integrated metric-guarded optimization to resist gaming
- Sectors: AI safety, governance
- Tools/products/workflows:
- Optimization frameworks that co-optimize , rare-valid fidelity , and false-positive penalty ; external audits of and
- Assumptions/dependencies:
- Third-party validators; robust estimators in the small- regime; adversarial testing
- Application: Cross-disciplinary scientific discovery engines
- Sectors: Science, R&D
- Tools/products/workflows:
- Agents that seek rare-valid hypotheses/experiments (unlikely under passive priors but admissible by constraints), maximizing subject to lab and safety constraints
- Assumptions/dependencies:
- High-fidelity simulators; reproducible validity criteria; experiment automation
Notes on Feasibility and Dependencies (common to many applications)
- Baseline definition (): Must be defensible and documented (historical logs, simpler models, or passive dynamics); different choices change .
- Validity definition (): Domain-specific, must be auditable; include semantic/functional/physical constraints; avoid narrow proxies that invite Goodhart’s Law.
- Resolution and target mass (): Choose to balance granularity and sample complexity; select in a regime with adequate statistical power; report confidence intervals.
- Absolute continuity and bins: For entropy-bin diagnostics, ensure nondegenerate bins and sufficient counts; path-divergence estimates must be stable.
- False-positive overhead: Track and apply protocol-specific costs (e.g., ) to account for misidentified rare sets.
- Implementation limits: Theorems separate potential from realized lift; actuation, sensing, and compute budgets can cap achievable .
- Ethics and safety: Especially in healthcare, finance, and policy, governance must constrain optimization to valid, beneficial futures and include fairness and accountability.
Glossary
- Absolute continuity (P ≪ Q): A measure-theoretic relation meaning one measure assigns zero probability to every set that the other measure deems impossible. Example: "Assume ."
- Actuation-limited optimum: The best achievable amplification given the physical limits on how strongly a system can act on the environment. Example: "the achievable lift approaches the actuation-limited optimum."
- Baseline law: The passive or reference probability distribution over trajectories against which changes are measured. Example: "once a level of description, baseline law, validity criterion, and observational resolution are fixed."
- Coarse-grained thermodynamic signatures: Finite-resolution summaries of thermodynamic behavior (e.g., entropy-production patterns) derived from binned trajectory data. Example: "coarse-grained thermodynamic signatures induced by nearby path laws."
- Coarse-graining (event-level): Aggregating detailed microstates or events into bins to analyze system behavior at a finite resolution. Example: "The symbol marks event-level coarse-graining"
- Dimensionless entropy production: Entropy production scaled by Boltzmann’s constant, typically written . Example: "define the dimensionless entropy production"
- Embedded representation hierarchy: A recursive internal modeling scheme where a system’s model includes itself and models of its own models. Example: "We formalize this self-in-world requirement as an embedded representation hierarchy."
- Entropy-production bins: Paired sets of trajectories grouped by having approximately equal and opposite entropy production. Example: "matched entropy-production bins and "
- Event-level information correction: A correction term that adjusts entropy-production log-ratio comparisons under feedback control. Example: "we therefore define the event-level information correction directly:"
- Feedback fluctuation relations: Generalizations of fluctuation theorems that account for measurement and control (feedback) in thermodynamic systems. Example: "Microscopic feedback fluctuation relations generally contain trajectory-dependent measurement and information terms"
- Fluctuation-theorem scale: The scaling relation between probabilities of entropy-producing and entropy-reducing events predicted by fluctuation theorems. Example: "the fluctuation-theorem scale gives"
- Information engine: A system that uses information to bias or transform physical trajectories, often performing work in the process. Example: "Maxwell-demon-like information engines."
- Intelligence potential: The maximum rare-valid probability lift achievable within an internal simulation, independent of whether it is realized. Example: "The intelligence potential for level is computed inside the level- simulation of level ."
- KL divergence: Kullback–Leibler divergence, a measure of how one probability distribution diverges from another; the binary version applies to two-outcome events. Example: "binary KL divergence"
- Landauer bookkeeping protocol: An accounting scheme for entropy or energy costs of information processing, proportional to surprisal. Example: "baseline-surprisal Landauer bookkeeping protocol"
- Legg–Hutter intelligence: A formal definition of intelligence as expected reward across a universal distribution of environments. Example: "Legg--Hutter intelligence defines an agent's intelligence by its expected reward over a universal distribution of computable environments"
- Maxwell–Boltzmann gas: A classical model of particle speed/energy distributions in ideal gases. Example: "three-dimensional Maxwell--Boltzmann gas"
- Maxwell's demon: A hypothetical agent that uses microscopic information to create apparent local entropy reductions. Example: "Maxwell's demon provides the canonical historical case"
- Modulus of continuity: A function controlling how a quantity (e.g., divergence) changes with approximation error, vanishing as error goes to zero. Example: "There exists a modulus of continuity "
- Nondegenerate entropy bins: A technical condition ensuring each entropy bin has at least a minimum probability under the measures compared. Example: "Nondegenerate entropy bins"
- Non-equilibrium fluctuation theorems: Results relating probabilities of forward/backward (entropy-producing/reducing) trajectories outside equilibrium. Example: "non-equilibrium fluctuation theorems"
- Observational resolution: The finite granularity at which trajectories are partitioned for measurement and analysis. Example: "finite observational resolution"
- Path measure: A probability measure over entire system trajectories rather than states at single times. Example: "passive path measure"
- Passive dynamics: The natural, uncontrolled evolution of a system without intervention. Example: "under passive dynamics"
- Policy class: The set of candidate policies or control laws considered within a model or simulation. Example: "Let be the simulated policy class."
- Rare-valid lift: The fractional increase in probability of rare yet valid trajectories relative to a passive baseline. Example: "rare-valid lift"
- Rare-valid set: The subset of trajectories that are both rare under the baseline and admissible under domain constraints. Example: "rare-valid set "
- Recursive self-simulation: An internal modeling process where a system simulates futures that include its own actions and future states. Example: "Recursive self-simulation becomes operational"
- σ-algebra: A collection of sets closed under complements and countable unions, defining the measurable structure. Example: "-algebra"
- Supremal (simulated rare-valid lift): The least upper bound of achievable lift within a given policy class in simulation. Example: "is the supremal simulated rare-valid lift"
- Theory-of-mind (style modeling): Modeling that represents other agents’ beliefs, perceptions, and actions within one’s own predictive framework. Example: "the thermodynamic analogue of theory-of-mind style modeling"
- Thermodynamic intelligence: A proposed measure of intelligence defined as rare-valid probability lift with thermodynamic grounding. Example: "we define thermodynamic intelligence as rare-valid probability lift"
- Trajectory functional: A function defined on entire trajectories used to evaluate and select actions. Example: "evaluates a trajectory functional "
- Trajectory space: The set of all possible trajectories considered at a given modeling level. Example: "the trajectory space at level "
Collections
Sign up for free to add this paper to one or more collections.