Reformulation Invariance and the Axiomatic Foundations of Inference

Published 19 Jun 2026 in math.ST, cs.IT, math.CT, and math.PR | (2606.21551v1)

Abstract: Maximum entropy, Bayesian updating, and exponential-family estimation are all instances of a common inference principle: selecting the measure or distribution that minimizes a divergence subject to the available constraints. Which divergence to use is usually decided by analytic convenience, by empirical performance, or by a set of axioms chosen to single it out, leaving open a basic question: why one divergence and not another? We answer it from a single requirement: an inference method should return the same answer whenever the same problem is presented in an equivalent form, for instance, after simply renaming its parts. This requirement alone forces inference to be the minimisation of a classical divergence, and each further reformulation it must respect tightens the admissible family one notch, narrowing the broad f-divergences to the α-divergences and finally to the single Kullback-Leibler (KL) divergence. Mathematically, inference is recast from minimising a numerical functional to selecting a least element under a preorder on positive measures, a divergence being merely one numerical scale that reproduces that preorder. The reformulations are the morphisms of a category of inference problems, and the invariance requirement says the inference operator is a covariant functor into the category of statistical models of Cencov, mirroring his characterisation of the Fisher metric. The representation is proved on finite spaces and lifted to general measurable spaces by an elementary closure, covering discrete and continuous spaces alike. Earlier axiomatisations, such as those of Shore-Johnson and Csiszar, postulate their consistency axioms directly and only on finite alphabets; here the axioms follow from reformulation invariance alone.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper shows that reformulation invariance uniquely leads to f-divergence based inference and, with stronger invariance, narrows the admissible measures to the KL divergence.
It introduces a categorical framework unifying Bayesian, maximum-entropy, and exponential-family inference via preorder minimization over measures.
The work extends representation theorems from finite spaces to general measurable spaces, providing a robust foundation for statistical inference.

Reformulation Invariance as the Foundation for Canonical Inference

Overview

The paper "Reformulation Invariance and the Axiomatic Foundations of Inference" (2606.21551) develops a rigorous and unified axiomatic characterization of inference as the minimization of classical statistical divergences. The authors demonstrate that the simple and abstract requirement of reformulation invariance—mandating that logically equivalent inference problems always yield the same solution—singles out $f$ -divergence-based inference, and further narrows the allowable divergences to the $\alpha$ - and finally the Kullback-Leibler (KL) divergence under progressively stronger invariance assumptions. This analysis is conducted categorically via functoriality into Čencov's category of statistical models, mirroring the uniqueness of the Fisher metric in information geometry. The representation theorems, previously limited to finite alphabets, are elevated to general measurable spaces through an order-theoretic closure construction.

Core Contributions

This work recasts statistical inference as the operation of selecting the "least element" under a suitable preorder on (positive finite) measures, with divergences demoted to secondary roles as numerical scales representing these orderings. The main results are:

Reformulation invariance: The requirement that the inference operator commutes with any information-preserving reformulation (i.e., measurable relabelling, coarsening, and splitting) is formalized categorically as a covariant functor between the category of inference problems and Čencov's category of statistical models. This property alone compels inference to be characterized as the minimization of a classical divergence.
Hierarchical narrowing: Additional invariances—independence of mass (total measure), distribution, or measurable reweightings—systematically restrict the family of admissible divergences, first to $\alpha$ -divergences, then uniquely to the KL divergence.
Order-theoretic foundation: Rather than treating divergences as primitive analytic objects, the approach identifies the underlying partial order as fundamental, with divergences providing numerical representations of this order.
Extension to general measurable spaces: The representation theorems are extended from finite spaces to countable, continuous, and abstract measurable spaces by constructing closures over increasingly fine finite partitions, bypassing previous limitations and technical obstacles in functional analysis.

Technical Details

Category-Theoretic Framework

Inference problems are modeled as a category whose morphisms correspond to measurable maps between sample spaces, encodings reformulations via splitting (disjoint subproblems), coarsening (lowering resolution), and relabelling. The inference operator is required to be a covariant functor from the category of constraint sets (information) to statistical models (measures), thus formalizing invariance under reformulation. This structure is strictly parallel to Čencov's characterization of Markov morphisms and congruence for metrics in information geometry.

Hierarchy of Consistency Axioms

A sequence of logically motivated axioms, each corresponding to invariance under a specific transformation, is introduced:

Locality/Isolated system: Information about one region should not affect inferred distributions elsewhere.
Coarse-grain consistency: Solutions at coarser resolutions must coincide with restrictions of fine-grained solutions.
Prior-consistency: When fine-level information is absent, the split over fine atoms must only depend on the prior, not on the total mass.
Mass-distribution independence: Inference on the distribution must remain invariant under changes to the total mass.
Moment-distribution independence: Invariance extends to all positive measurable reweightings (moments).

Each additional axiom constrains the admissible divergence:

Bare locality/coarse-grain invariance permits all $f$ -divergences.
Mass/distribution independence narrows the family to $\alpha$ -divergences.
Moment/distribution independence restricts to KL divergence.

Representation by Preorder and Divergence

The inference procedure is formalized as selection of minimally ranked elements under a preorder on measures. The ordering is shown (via the Debreu representation theorem and subsequent functional equations) to correspond to the minimization of an $f$ -divergence:

$D_f(P \| Q) = \int f\left(\frac{dP}{dQ}\right) dQ$

The precise functional form of $f$ is determined by the set of invariances imposed, with rigorous derivation that stricter invariance implies uniqueness (as in the case of KL). The representation is first constructed for finite spaces and then extended to general measurable spaces via a closure operation using limits over finite partitions—guaranteeing consistency with the order induced by discretizations.

Comparison with Prior Axiomatisations

Earlier axiomatics (e.g., Shore-Johnson, Csiszár) postulate multiple independent numerical axioms and often do not attain uniqueness, particularly for continuous spaces, and do not use the $\sigma$ -algebra as a fundamental primitive. This framework, conversely, derives the needed axioms from reformulation invariance alone and bridges the gap between the intuitionistic and categorical approaches. Notably, it clarifies that the prior is not an ad hoc choice but is fixed by invariance requirements—or, in the case of relabelling invariance, necessarily uniform.

Implications and Theoretical Impact

Canonical Status of Divergences

The main implication is that the "classical" divergences (not merely the KL divergence, but the broad family including $f$ - and $\alpha$ 0-divergences) arise not from analytic necessity or empirical convenience, but as the unique representations of information processing that is rational in a rigorous, category-theoretic sense—i.e., invariant under all problem reformulations which preserve information.

Unification of Bayesian and Maximum Entropy Inference

The established axiomatic hierarchy unifies maximum-entropy, Bayesian updating, and exponential-family estimation under the broader umbrella of divergence minimization, by demonstrating compatibility with Bayesian inference rules (Bayesian conditioning) and revealing that f-divergence minimization is the only additive-divergence inference consistent with Bayesian updating.

Extension to General Measurable Spaces

The closure-based lift allows the same principles to underlie inference not only on finite and countable sets, but on arbitrary measurable spaces (with appropriate domination assumptions). This resolves a major theoretical limitation of prior approaches and has implications for information geometry on infinite-dimensional models.

Potential for Future Work

Non-commutative extensions: The order-theoretic and categorical approach can extend naturally to quantum inference or non-commutative (von Neumann algebra) settings.
Structured inference: The framework facilitates incorporation of richer constraint structures as they arise in applications such as coding, statistical learning, or physical inference with non-standard symmetries.
Alternative information logics: The approach suggests that alternative inference strategies should be interpreted as representing different logics of information, classifiable by the reformulation invariances they do or do not commit to.

Numerical and Logical Claims

Uniqueness: The theorems prove that, under the specified invariances, only minimization of the specified divergence family is logically admissible; no other continuous, expressive, measure-consistent inference procedure exists.
Extension: The representation holds on all $\alpha$ 1-finite measurable spaces, without recourse to functional analysis or limiting arguments for Radon–Nikodym derivatives except as needed to pass to densities.

Conclusion

The paper establishes reformulation invariance as a unifying and foundational principle for statistical inference, providing a comprehensive axiomatic and categorical rationale for the canonical status of classical divergences in statistics and information theory. The logical structure is made explicit and shown to hold across the discrete-continuum divide, integrating prior Bayesian and maximum entropy methodologies and bridging order-theoretic and categorical information geometric perspectives. This formalism sets an agenda for future inference theory targeting more general spaces, alternative logics, and structured domain knowledge in AI, statistics, and beyond.

Markdown Report Issue