Papers
Topics
Authors
Recent
Search
2000 character limit reached

Group invariance of $f$-divergences and the Fisher--Rao distance

Published 24 Jun 2026 in math.ST and cs.IT | (2606.25790v1)

Abstract: Many statistical models have natural symmetries described by a group action. We study how such symmetries affect the comparison of two distributions. We work with a transformation model in which a group acts on both the sample space and the parameter space, and the densities transform with a multiplier. Under this assumption, we show that every $f$-divergence is invariant under the group action. As a consequence, an invariant divergence depends only on a maximal invariant of the pair of parameters. When the action on the parameter space is transitive, this maximal invariant is given by a double coset. We apply this result to multidimensional location-scale families, and we show that the same reduction applies to the Fisher--Rao distance.

Authors (2)

Summary

  • The paper's main contribution is proving that all f-divergences remain invariant under group actions, reducing comparisons to the relative positions of parameters.
  • It introduces a rigorous framework for maximal invariants in transformation models, using double coset reduction in location-scale families and spectral analysis.
  • The study further demonstrates that the Fisher–Rao metric is affine invariant, simplifying computations and linking geometric structures to symmetric spaces.

Group Invariance of ff-Divergences and the Fisher–Rao Distance

Overview

The paper "Group invariance of ff-divergences and the Fisher--Rao distance" (2606.25790) provides a rigorous treatment of how group symmetries interact with statistical divergences, specifically ff-divergences and the Fisher–Rao distance. The authors formalize the invariance properties of these divergences under group actions and elucidate the implications for the structure and computation of divergences between distributions, particularly in models with natural symmetries such as location-scale families.

Formalization of Group Actions and Invariant Divergences

The paper establishes a general framework for transformation models in which a group GG acts measurably on both the sample space XX and the parameter space Θ\Theta, with densities transforming according to a multiplier. This formalizes the requirement that distributions and their comparisons—via divergences—remain unaffected by simultaneous application of the group action. The main result is that every ff-divergence, defined as

Df(θ1:θ2)=Xf(p(xθ2)p(xθ1))p(xθ1)λ(dx),D_f(\theta_1 : \theta_2) = \int_X f\left(\frac{p(x|\theta_2)}{p(x|\theta_1)}\right) p(x|\theta_1) \lambda(dx),

is invariant under the group action: Df(gθ1:gθ2)=Df(θ1:θ2)D_f(g\theta_1 : g\theta_2) = D_f(\theta_1 : \theta_2) for all gGg \in G.

This invariance implies that the divergence depends exclusively on the "relative position" of the parameters, necessitating a treatment via maximal invariants.

Maximal Invariants and Double Coset Reduction

For transitive group actions on the parameter space, the paper leverages classical results in group theory and statistics to identify maximal invariants for pairs of parameters via double cosets ff0, where ff1 is the stabilizer subgroup and ff2, ff3 are suitable representatives of ff4, ff5. The divergence value is thus a function of this maximal invariant, reducing the original problem to an orbit space parameterization.

This principle is concretely realized in location-scale families over ff6, where the relevant maximal invariants for pairs are described through singular values and block norms (related to the scale and location differences, canonically ignoring orthogonal rotations). The reduction is established for both general ff7-divergences and the Fisher--Rao metric.

Explicit Construction in Location-Scale Families

In multidimensional location-scale models, the sample space is ff8, and the affine group ff9 acts naturally. The authors demonstrate the necessity to move from the conventional parameter space ff0 to the quotient ff1 to ensure identifiability and proper invariance.

The reduction yields that invariant divergences depend only on:

  • The singular values (with their multiplicities) of ff2 (relative scale transformation)
  • The block norms of ff3, where ff4 comes from SVD and ff5 (relative location, canonically decomposed according to symmetry of the scale transformation)

This structure provides an explicit, minimal summary of the information relevant to any invariant divergence between two distributions in such models.

Invariance in Information Geometry: Bregman and Fenchel–Young Divergences

The paper extends the analysis to canonical divergences in information geometry, especially Bregman and Fenchel–Young divergences in dually-flat spaces induced by log-determinant potentials over the cone of positive definite matrices. Under congruence actions, these divergences are shown to be spectral, i.e., they depend only on the eigenvalue spectrum of the relative scale transformation. The dual actions in these coordinate systems confirm the mathematical consistency and geometric compatibility of invariance claims.

Fisher–Rao Metric: Group Invariance and Reduction

The Fisher–Rao metric, as a Riemannian structure on statistical models, is demonstrated to possess affine invariance in the location-scale setting. The reduction principle applies, yielding that the Fisher–Rao distance between two distributions is a function of the canonical relative position as described above. The paper details the induced metrics, tangent space structure, and pullback metrics on quotient parameterizations, confirming non-degeneracy and structural compatibility.

Explicit computations for the multivariate normal model are provided. Here, the Fisher metric reduces to the standard form, and the geodesic distance becomes a function of relative location and scale, with closed-form expressions available for ff6 and explicit representations of geodesics for ff7.

Connections to Symmetric Spaces and Alternative Riemannian Structures

The paper includes discussion of alternative geometric parameterizations of the multivariate normal manifold, such as Lovrić, Min-Oo, and Ruh's identification with the symmetric space ff8, which induce invariant metrics differing from the Fisher metric but sharing the reduction principles in terms of group invariants (relative positions parametrized via determinant-normalized congruence classes).

Implications and Future Directions

These results provide a general and unified rationale for the structure of invariant divergences and metrics in statistical models with group symmetries. Practically, such reductions simplify computation, clarify necessary statistics, and inform the design of invariant estimators and hypothesis tests. Theoretically, the explicit identification of maximal invariants in orbit spaces refines foundational understanding in information geometry and statistical decision theory.

Potential future directions include:

  • Extending invariant reduction principles to other statistical models, including discrete distributions and nonparametric settings
  • Developing efficient algorithms for computation of divergences and geodesic distances based solely on invariant quantities
  • Investigating the interplay between invariance, data augmentation, and equivariant learning protocols in deep generative models

Conclusion

The paper rigorously establishes that ff9-divergences and the Fisher--Rao distance are invariant under natural group actions in statistical models, and explicitly describes the associated maximal invariants for pairs of parameters. In multidimensional location-scale families, this yields a canonical reduction to quantities parameterizing relative position modulo symmetry, with implications for both theoretical understanding and practical computation of statistical divergences and metrics. The framework unifies invariant analysis across a broad class of statistical and geometric divergences, grounding them in explicit orbit space structures.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

Overview: What is this paper about?

This paper studies symmetry in statistics and how it helps compare two probability distributions. It shows that many popular “difference measures” between distributions (called f-divergences, like Kullback–Leibler or Hellinger) and a geometric distance (the Fisher–Rao distance) do not change when you move or rescale both distributions in the same way. Because of this invariance, the paper explains how to rewrite these comparisons using a smaller set of “essential” variables that capture only the relative position and size between the two distributions.

Goals and questions

The authors set out to answer a few simple-sounding questions:

  • If a statistical model has natural symmetries (like shifting or scaling), do standard measures of difference between two distributions stay the same when we apply those symmetries to both distributions?
  • What information about the two distributions really matters once we ignore those symmetries?
  • Can we describe that leftover, essential information in a clean, general way?
  • How does this work for common models built from shifting and scaling a base shape (location–scale families)?
  • Does the same idea apply to the Fisher–Rao distance, a geometric way to measure how far two distributions are on a curved “shape” of all distributions?

Methods: How did they study it?

Think of a “group action” as a set of moves you can make—like shifting all data by the same amount, or stretching it in every direction—that follow consistent rules. A “statistical model” gives you a family of distributions (recipes for how likely outcomes are) that can be changed by these moves.

  • Transformation model: The group acts both on the data (sample space) and on the parameters (like mean and scale). The probability density transforms in a way that accounts for stretching or compressing (using a “multiplier” that adjusts volume).
  • Invariance: An f-divergence compares two distributions. The key trick is: if you apply the same group move to both distributions, the integral formula for the divergence stays the same (invariance).
  • Maximal invariant: Imagine collecting all versions of a parameter pair you get by applying all allowed moves. That collection is an “orbit.” A maximal invariant is a way to label or pick the essential information for an orbit—what remains after you ignore all symmetric moves. Once you know a maximal invariant, any invariant divergence can be written as a function of it.
  • Double coset (transitive case): When the group acts richly enough on the parameter space (transitive action), the essential relative position of two parameters is captured by an object called a “double coset.” You can think of it as a standardized label of how one parameter sits relative to the other after you remove the freedom to move them around.
  • Location–scale families: These are models where you take a base distribution and get new ones by shifting (location) and stretching (scale). In multiple dimensions, stretching can happen differently in different directions. The authors use a careful “quotient” description to ignore irrelevant rotations of the scale and then use a singular value decomposition (SVD) to break the relative scale into simple pieces (the singular values). They also split the relative location into blocks aligned with those stretches and summarize each block by its size (its norm).
  • Fisher–Rao distance: This is a distance defined by the Fisher information geometry. Using the same symmetry ideas (affine invariance), you can reduce the problem of finding the distance between two distributions to finding the distance between a transformed “canonical” pair, which is simpler to study.

Main findings and why they matter

Here are the key results, explained simply:

  • All f-divergences are invariant under the group action when the densities transform appropriately. If you move or scale both distributions in the same way, their f-divergence doesn’t change. This includes many popular measures like KL, Hellinger, total variation, and chi-square.
  • Any invariant divergence depends only on the maximal invariant of the pair of parameters. In the transitive case, that maximal invariant can be written as a double coset, which labels the pair’s relative position after removing symmetry.
  • For multidimensional location–scale families:
    • The essential relative scale is captured by the singular values of the matrix that maps one scale to the other (these are the amounts of stretching in each principal direction).
    • The essential relative location is captured by “block norms”—the sizes of parts of the relative shift aligned with groups of equal singular values. Together, these singular values and block norms are all that any invariant f-divergence can depend on.
  • The same reduction applies to the Fisher–Rao distance. Because of affine invariance, you can transform any pair of distributions to a canonical form (one is the base reference), and the distance depends only on the same kinds of invariant variables (relative scale singular values and block norms of the transformed relative location).
  • For certain dually flat divergences (like Bregman or Fenchel–Young), the paper shows a matching invariance under dual group actions. In scale-only models, these divergences depend only on the eigenvalues of the relative scale (they are “spectral” divergences), reinforcing the same symmetry story.

Why this matters:

  • It simplifies formulas and calculations. Instead of handling all parameters directly, you reduce to a small set of invariant quantities.
  • It explains why many different divergence formulas end up depending on the same “spectral” or relative features.
  • It improves understanding of what truly affects a comparison between two distributions, and what is just a matter of coordinate choice (like a common shift or rotation).

Implications: What is the potential impact?

  • Cleaner, more general theory: The results unify many divergence formulas under the same symmetry principle, showing they all depend on the same essential “relative” data.
  • Practical simplification: In applications (statistics, machine learning, signal processing), you can transform problems into canonical forms and compute divergences or distances using only the invariant variables (singular values and block norms), saving effort and avoiding clutter from irrelevant coordinates.
  • Geometric insight: For the Fisher–Rao distance and related geometries, the paper clarifies how symmetry reduces the problem and highlights what variables determine the distance.
  • Foundation for further work: Even when closed-form answers are hard, knowing the exact invariant variables guides numerical methods, bounds, and approximations, and helps design invariant algorithms and tests.

In short, the paper shows that symmetry is not just a mathematical nicety—it is a powerful tool that strips away distractions, leaving only the core, relative features that truly matter when comparing two statistical distributions.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a single, actionable list of what remains missing, uncertain, or unexplored in the paper.

  • Characterize precisely when the transformation model assumption holds: give necessary and sufficient conditions on the base density and measure for equations like χ(g)p(gxgθ)=p(xθ)\chi(g)\,p(gx\mid g\theta)=p(x\mid\theta) and λ(gB)=χ(g)λ(B)\lambda(gB)=\chi(g)\lambda(B) to be satisfied beyond the radial/affine cases considered.
  • Extend the invariance results to non-radial base densities: determine which classes of anisotropic/elliptical (or more general) f0f_0 still yield invariance under appropriate group actions, and how the maximal invariants change.
  • Identify minimal assumptions on f0f_0 for identifiability in the quotient parameterization: provide necessary and sufficient conditions (beyond positivity, radiality, and finite second moment) under which (μ,[V])(\mu,[V]) is identifiable; analyze heavy-tailed cases (e.g., Student–tt, Cauchy) where moments may not exist.
  • Provide explicit, closed-form expressions for commonly used ff-divergences as functions of the proposed invariants: for concrete models (Gaussian, Laplace, Student–tt), derive DfD_f (e.g., KL, Hellinger, χ2\chi^2, total variation) in terms of singular values of V21V1V_2^{-1}V_1 and the block norms from V21(μ1μ2)V_2^{-1}(\mu_1-\mu_2).
  • Develop constructive, canonical selections of maximal invariants for general homogeneous spaces: beyond location–scale, give algorithmic procedures (not relying on the axiom of choice) to compute maximal-invariant representatives of double cosets H\G/HH\backslash G/H and analyze their measurability/continuity.
  • Analyze the topology/geometry of the double-coset space H\G/HH\backslash G/H in general: establish when it admits a smooth/stratified manifold structure, how singularities arise (e.g., due to eigenvalue multiplicities), and how this affects continuity of invariant formulas.
  • Quantify stability of the invariant parametrization under eigenvalue multiplicity changes: study continuity and differentiability of DfD_f as singular values coalesce and block structures change, and design robust numerical schemes that avoid instability near multiplicity transitions.
  • Address models with parameter-dependent support or zeros in the density: determine how invariance and ff-divergence definitions extend when p(θ)p(\cdot\mid\theta) vanishes on sets of positive measure or has support varying with θ\theta.
  • Extend the reduction principle to non-transitive parameter actions: provide explicit maximal invariants for Θ\Theta where the GG-action is not transitive, and characterize DfD_f as functions of these invariants for such models.
  • Generalize beyond ff-divergences and Fisher–Rao: identify other divergences/distances (e.g., α\alpha-divergences, Wasserstein distances) that exhibit analogous group invariance and reduction to maximal invariants; specify necessary conditions on the divergence and group.
  • Classify congruence-invariant Bregman/Fenchel–Young divergences on SPD(d)SPD(d): beyond the log-determinant potential, characterize all convex generators FF whose Bregman/FY divergences are invariant under Aη=AηAA\bullet\eta=A\eta A^\top (or the dual action), and relate them to spectral divergences.
  • Bridge the ff-divergence and dually flat perspectives: identify when ff-divergences on SPD(d)SPD(d) coincide with (or can be represented by) congruence-invariant Bregman divergences, and determine the classes of ff for which this is possible.
  • Fisher–Rao distance: obtain explicit formulas or efficient algorithms in the reduced coordinates for multivariate normals in d>1d>1; quantify approximation errors for existing surrogates and exploit the invariance to design practical solvers for the two-point boundary-value problem.
  • Analyze geometric properties of the quotient geometry: study curvature, geodesic completeness, and convexity of geodesic balls for the pullback metric on Rd×(GL(d)/O(d))\mathbb{R}^d\times (GL(d)/O(d)), and relate them to properties of DfD_f and dFRd_{\mathrm{FR}}.
  • Compare and relate the affine-invariant Fisher–Rao geometry to the symmetric-space construction of Lovrić–Min-Oo–Ruh: provide explicit transformations between distances, quantify differences, and identify scenarios where one geometry yields advantages (closed forms, numerical stability).
  • Develop statistical procedures leveraging the invariant reduction: construct invariant tests, confidence sets, or estimators that depend only on the maximal invariants; analyze their optimality (minimax/invariance) and sampling distributions.
  • Consider approximate invariance and robustness: define and analyze divergences under approximate group symmetry, providing perturbation bounds for DfD_f when the model only nearly satisfies the transformation model.
  • Extend to discrete sample spaces and other measures: verify or adapt the multiplier/quasi-invariance framework for counting or mixed measures and characterize the corresponding maximal invariants.
  • High-dimensional computation: design numerically stable and scalable methods to compute singular values and block norms, including randomized SVD variants and error bounds, to make the invariant reduction practical for large dd.
  • Random matrix and probabilistic analysis of invariants: derive the distributional behavior of the singular values and block norms under natural priors or sampling models, enabling Bayesian or frequentist inference on divergences through invariant coordinates.
  • Infinite-dimensional extensions: explore whether analogous invariance and maximal-invariant reductions hold for functional data or Gaussian processes under diffeomorphism or unitary actions, and identify analytical obstacles (e.g., non-separable spaces, lack of Haar measure).
  • Alternative canonical parameterizations: investigate whether parameterizations other than GL(d)/O(d)GL(d)/O(d) (e.g., via Cholesky or polar decompositions with constraints) yield simpler or more stable maximal invariants and formulas for DfD_f and dFRd_{\mathrm{FR}}.

Practical Applications

Practical Applications Derived from the Paper’s Findings

This paper shows that many statistical dissimilarities (f-divergences, Fisher–Rao distance, and certain Bregman/Fenchel–Young divergences) are invariant under natural group actions and, crucially, depend only on maximal invariants of parameter pairs. In location–scale models, any invariant divergence between two distributions reduces to functions of (i) the singular values of the relative scale S = V₂⁻¹V₁ and (ii) block norms of a transformed relative location ν = V₂⁻¹(μ₁−μ₂). This reduction yields canonical, symmetry-respecting signatures for comparing distributions, enabling simpler, more robust workflows across sectors.

Immediate Applications

The following can be implemented now with standard linear algebra (SVD/eigendecomposition), existing statistical libraries, and routine numerical optimization.

  • Affine-invariant drift detection in sensor networks and IoT (Industry: robotics, energy, manufacturing)
    • Use case: Detect distribution shifts in streaming sensor data that are due to real changes, not to re-calibration, unit conversions, or frame changes.
    • Workflow: Estimate (μ, V) for sliding windows; compute S = V₂⁻¹V₁, ν = V₂⁻¹(μ₁−μ₂); compute invariant signatures (singular values of S and block norms of Uᵀν); evaluate an f-divergence expressed as a function of these invariants; alert if threshold crossed.
    • Tools/products: An “Invariant Shift Monitor” module for edge gateways; integration with Kafka/Flink pipelines.
    • Assumptions/dependencies: Transformation model reflects domain symmetries (location/scale changes, linear transforms); windows large enough for stable estimates; V₂ invertible and SPD where needed.
  • Robust A/B testing and online experimentation under re-scaling/re-basing (Industry: software, e-commerce, ad-tech)
    • Use case: Compare treatment/control distributions when metrics are re-scaled (e.g., currency inflation, unit changes) or re-based (e.g., baseline offsets).
    • Workflow: Map both variants to canonical form using the m-map (V₂⁻¹(·)); compute invariant divergences (e.g., KL, Hellinger) as functions of SVD invariants; report effects unaffected by nuisance transformations.
    • Tools/products: Add-on to experimentation platforms (e.g., LaunchDarkly/Optimizely) providing “affine-invariant effect size.”
    • Assumptions: Metric distributions well-modeled by location–scale families (or reasonable approximations).
  • Covariance monitoring via spectral divergences (Finance, Healthcare, Cybersecurity)
    • Use case: Track changes in risk (finance), physiology (healthcare), or threat profiles (security) represented by SPD covariance matrices.
    • Workflow: Compare Σ₁ and Σ₂ using spectral divergences (functions of eigenvalues of Σ₂⁻¹Σ₁); optionally use the Bregman divergence with log-det potential for congruence invariance.
    • Tools/products: “Spectral Divergence Dashboard” for risk/physiology monitoring; Python/R package for SPD spectral distances.
    • Assumptions: SPD matrices well-estimated; congruence action models the domain (e.g., linear mixing/rotations).
  • Domain shift detection in vision and audio with illumination/contrast/volume invariance (Industry: computer vision, speech)
    • Use case: Compare feature distributions across conditions (lighting, contrast, gain) without confounding nuisances.
    • Workflow: Fit location–scale models to features; canonicalize pairs using V₂⁻¹; compute invariant divergences on SVD invariants; trigger retraining or adaptation.
    • Tools/products: Plug‑in for MLOps monitoring (e.g., WhyLabs/Fiddler) that reports invariant drifts.
    • Assumptions: Feature distributions approximately elliptical/radial; affine approximations sufficient.
  • Medical imaging harmonization across scanners (Healthcare)
    • Use case: Compare or harmonize MRI/CT intensity distributions across vendors/sites without bias from scanner scaling or baselines.
    • Workflow: Estimate (μ, V) from regions of interest; reduce to S and ν; use invariant f-divergence or Fisher–Rao reduction to quantify harmonization gaps.
    • Tools/products: PACS-compatible QC module for cross-site harmonization audits.
    • Assumptions: Approximate location–scale behavior after standard pre-processing; identifiability ensured via quotient GL/O.
  • Affine-invariant two-sample and goodness-of-fit tests (Academia, Industry)
    • Use case: Hypothesis testing where nuisance transformations (translation, scaling, congruence) should be ignored.
    • Workflow: Construct tests using maximal invariants (double coset or SVD-based invariants); critical values via permutation/bootstrapping on invariant statistics.
    • Tools/products: Statistical libraries offering “group-invariant tests” for common families (Gaussian, elliptical).
    • Assumptions: Group action well-specified and transitive on parameter space; finite-moment and integrability conditions satisfied.
  • Canonical pair reduction layers for ML pipelines (Software/ML)
    • Use case: Stabilize loss/metrics by removing nuisance parameters before comparing distributions.
    • Workflow: Implement a “Canonical Pair Reducer” that maps (μ₁,V₁),(μ₂,V₂) to (Σ, r)-type invariants (singular values, block norms); feed to divergence modules.
    • Tools/products: PyTorch/TensorFlow layers for invariant comparisons; scikit-learn transformers for SPD and location–scale data.
    • Assumptions: Data fits transformation model; numerical stability of SVD in high dimensions.
  • Fair and comparable cross-country/agency metrics (Policy, Public sector)
    • Use case: Compare distributions (e.g., incomes, emissions) across units with different scales and baselines.
    • Workflow: Apply invariant divergences to summarize differences ignoring units and baselines, focusing on shape/relative spread.
    • Tools/products: Statistical policy dashboards reporting affine-invariant divergence scores.
    • Assumptions: Appropriate pre-normalization; interpretations validated with domain experts.
  • Device calibration and QA in manufacturing and daily life (Industry, Consumer electronics)
    • Use case: Verify that two devices (scales, thermometers, microphones) produce statistically similar outputs up to affine calibration.
    • Workflow: Collect paired measurements, estimate (μ,V), compute invariant divergence; accept/reject based on thresholds.
    • Tools/products: Embedded QC apps; smartphone apps for sensor sanity checks.
    • Assumptions: Sufficient sample size; affine model reasonable for device’s operating range.
  • Privacy-aware change detection via invariant signatures (Cross-sector)
    • Use case: Share only invariant summaries (singular values, block norms) for inter-org monitoring to reduce exposure of raw parameters.
    • Workflow: Compute and transmit invariant signatures; central service computes divergence and alerts.
    • Tools/products: APIs for invariant signature exchange.
    • Assumptions: Invariants retain enough sensitivity for the task; privacy risk assessment needed.

Long-Term Applications

These require additional research, scaling, or domain adaptation (e.g., beyond location–scale/elliptical models, improved geodesic solvers, automated symmetry discovery).

  • Automated symmetry discovery and invariant divergence selection (Academia, Software)
    • Goal: Learn group actions from data and select divergences invariant to discovered symmetries; automate construction of maximal invariants beyond double cosets for complex models.
    • Dependencies: Advances in representation learning and causal discovery; theoretical guarantees for learned symmetries.
  • Closed-form or fast approximations for Fisher–Rao distances in multivariate normals and beyond (Academia, Industry)
    • Goal: Practical, scalable approximations of Fisher–Rao distances using the paper’s reduction to canonical forms; GPU-accelerated geodesic solvers.
    • Dependencies: Numerical geometry on SPD manifolds; efficient solvers for two-point boundary-value problems.
  • Invariant metrics for evaluation of deep generative models (Industry, Academia)
    • Goal: Replace ad hoc metrics with group-invariant divergences measuring distributional similarity while factoring out nuisance transformations (e.g., contrast/scale in images).
    • Dependencies: Extensions to nonparametric or implicit models; robust estimation of invariants from deep features.
  • Federated and privacy-preserving monitoring using invariant signatures (Cross-sector)
    • Goal: Share only invariant summaries to monitor population changes across institutions without revealing raw parameters or identities.
    • Dependencies: Differential privacy for invariant statistics; secure aggregation protocols.
  • Robustness certification under affine nuisances (Software, Safety-critical systems)
    • Goal: Certify ML systems for invariance to specified group actions by bounding invariant divergences under perturbations.
    • Dependencies: Verified SVD/eigensolvers; formal methods for group actions in ML.
  • Invariant control and planning in robotics (Robotics)
    • Goal: Use invariant divergences to compare sensor/feature distributions across poses and loads for adaptive control and transfer.
    • Dependencies: Extension to SE(3) and non-linear group actions; real-time invariant computation on embedded hardware.
  • Regulatory standards for comparable statistics across jurisdictions (Policy)
    • Goal: Define standardized, affine-invariant divergence benchmarks for cross-country metrics (e.g., inflation-adjusted distributions).
    • Dependencies: Consensus on modeling assumptions; stakeholder alignment.
  • Streaming, high-dimensional invariant analytics (Industry)
    • Goal: Real-time computation of SVD/eigen spectra for thousands of sensors/features, with adaptive rank and sketching.
    • Dependencies: Randomized linear algebra, sketching, and hardware acceleration.
  • Extending invariant reduction to non-Euclidean sample spaces and complex groups (Academia)
    • Goal: Maximal invariants and divergence invariance under diffeomorphism groups, Lie groups on manifolds, and beyond.
    • Dependencies: Differential geometry, harmonic analysis on groups, scalable algorithms.
  • Invariant hypothesis testing and confidence regions with nuisance parameters (Academia, Healthcare/Clinical trials)
    • Goal: Develop tests and intervals that depend only on maximal invariants, improving power and interpretability under nuisance transformations.
    • Dependencies: Asymptotic theory and finite-sample corrections; regulatory validation.

Notes on feasibility and assumptions that cut across applications:

  • Transformation model validity: The underlying data should admit the assumed group action (e.g., location–scale, congruence for SPD). For the location–scale constructions, base densities are assumed radial/elliptical and strictly positive, with integrability (e.g., finite second moment).
  • Identifiability and quotienting: In higher dimensions, identifiability is enforced via quotienting by O(d); implementations must map to GL(d)/O(d) or use spectral decompositions.
  • Numerical considerations: Stable SVD/eigen computations are required; care is needed in high dimensions or ill-conditioned cases (regularization/sketching may be necessary).
  • Model mismatch: If distributions deviate strongly from the assumed family, invariant divergences may misrepresent differences; diagnostic checks are recommended.
  • Computational cost: Fisher–Rao distances often require numeric geodesics (no general closed form for multivariate normals at d>1); approximations or spectral alternatives may be preferable in production.

Glossary

  • Affine group: The group of all invertible affine transformations on ℝd, typically written Aff(d), combining linear maps and translations. "Denote the action of the affine group Aff(d)\textup{Aff}(d) on Θ\Theta by"
  • Affine invariance: A property of a quantity (e.g., a distance) that remains unchanged under affine transformations. "affine invariance reduces the distance between two distributions to the distance from a canonical base distribution (Proposition~\ref{prop:FR-quotient})."
  • Axiom of choice: A set-theoretic principle guaranteeing the existence of selections from arbitrary collections of nonempty sets; often invoked to ensure existence without explicit construction. "We remark that, in general, such a self-map exists due to {\it the axiom of choice}."
  • Borel measurable function: A function measurable with respect to the Borel σ-algebra, ensuring integrability/measure-theoretic properties on ℝd. "for some positive Borel measurable function f1f_1 on [0,)[0,\infty)."
  • Bregman divergence: A class of divergences generated from a convex function, measuring discrepancy via the function’s linearization; central in information geometry and convex analysis. "the associated Bregman divergence also satisfies"
  • Centered matrix scale family: A statistical family where distributions differ only by a matrix “scale” around a centered (zero-mean) base distribution. "We consider a centered matrix scale family and a dually flat divergence"
  • Characteristic function: The Fourier transform of a probability distribution, uniquely describing it and useful for identifiability arguments. "the characteristic function φμ,V\varphi_{\mu, V} of the density p(x(μ,V))p(x | (\mu, V) )"
  • Chi-square divergence: An f-divergence measuring discrepancy between probability distributions via a squared density ratio; also known as Pearson’s χ² divergence. "the total variation distance and the chi-square divergence."
  • Congruence action: The transformation of a matrix by AΣAA\Sigma A^\top, used to express how positive definite matrices transform under linear changes of coordinates. "congruence actions on positive definite matrices."
  • Convex conjugate: The Legendre–Fenchel transform FF^* of a convex function FF, mapping to the dual via supremum over affine minorants. "The convex conjugate of FF is"
  • Diagonal action: A simultaneous group action on a product space, acting identically on each component (e.g., (θ1,θ2)(gθ1,gθ2)(\theta_1,\theta_2)\mapsto(g\theta_1,g\theta_2)). "the diagonal action on the product space Θ×Θ\Theta \times \Theta is usually not transitive."
  • Diffeomorphism: A smooth bijection with a smooth inverse between manifolds, preserving differentiable structure. "The map ι\iota is a diffeomorphism between Rd×GL(d,R)/O(d,R)\mathbb{R}^d \times GL(d,\mathbb{R})/O(d,\mathbb{R}) and SL(d+1,R)/SO(d+1,R)SL(d+1,\mathbb{R})/SO(d+1,\mathbb{R})."
  • Double coset: An equivalence class HgHHgH in a group GG relative to a subgroup HH, capturing relative positions under left and right translations by HH. "Define the {\it double coset space} by H\G/HH \backslash G / H"
  • Dually flat divergence: A divergence compatible with a pair of dual affine coordinate systems induced by a convex potential (e.g., Bregman/Fenchel–Young), giving a flat information-geometric structure. "a dually flat divergence~\cite{ohara1996dualistic,IG-2016}"
  • Dually flat structure: A geometric structure on a manifold with dual, flat affine connections induced by a convex potential. "induces a dually flat structure"
  • Exponential family: A class of distributions with densities of the form exp(θ,xF(θ))h(x)\exp(\langle\theta,x\rangle - F(\theta))h(x); KL divergence becomes a Bregman divergence in natural parameters. "When the scale family is an exponential family and its Kullback--Leibler divergence is represented by this Bregman divergence, the two viewpoints coincide."
  • f-divergence: A broad class of divergences defined via a convex function ff of the likelihood ratio, encompassing KL, Hellinger, χ², and others. "every ff-divergence is invariant under the group action."
  • Fenchel–Young divergence: A divergence defined as F(θ)+F(η)θ,ηF(\theta)+F^*(\eta')-\langle\theta,\eta'\rangle from a convex function FF and its conjugate FF^*, generalizing Bregman divergences. "the Fenchel--Young divergence~\cite{Acharyya2013LearningToRank}"
  • Fisher metric: The Riemannian metric on a statistical manifold defined by the covariance of score functions (derivatives of log-density). "the Fisher metric, specifically, gijF(θ)=Xpθ(x)θilogpθ(x)θjlogpθ(x)λ(), θ=(θi)iΘ.g^{F}_{ij}(\theta) = \int_X p_{\theta}(x) \, \frac{\partial}{\partial \theta_i} \log p_{\theta}(x) \frac{\partial}{\partial \theta_j} \log p_{\theta}(x) \, \lambda(), \ \theta = (\theta_i)_i \in \Theta."
  • Fisher–Rao distance: The geodesic distance induced by the Fisher metric on a statistical manifold, measuring intrinsic separation between distributions. "The Fisher--Rao distance is the Riemannian (geodesic) distance of the Fisher metric."
  • Geodesic distance: The length of the shortest path between two points on a Riemannian manifold, computed using the manifold’s metric. "The Fisher--Rao distance is the Riemannian (geodesic) distance of the Fisher metric."
  • General linear group: The group of all invertible d×dd\times d real matrices, denoted GL(d,R)GL(d,\mathbb{R}). "Let GL(d,R)GL(d, \mathbb{R}) be the general linear group on Rd\mathbb{R}^d."
  • Homogeneous space: A space on which a group acts transitively, representable as a quotient G/HG/H by a stabilizer subgroup HH. "The parameter space Θ\Theta is identified with a homogeneous space G/HG/H"
  • Immersion: A smooth map whose differential is injective at each point; failure to be an immersion means directions are collapsed. "Therefore, this map is not an immersion if d2d \ge 2."
  • Invariant divergence: A divergence that remains unchanged under a specified group action consistent with the model’s symmetries. "an invariant divergence depends only on a maximal invariant of the pair of parameters."
  • Kullback–Leibler divergence: An asymmetric f-divergence measuring relative entropy between two distributions. "the Kullback--Leibler divergence"
  • Lebesgue measure: The standard translation-invariant measure on ℝd used for integration in Euclidean spaces. "where \ell denotes the Lebesgue measure."
  • Legendre-type convex function: A strictly convex, essentially smooth function whose gradient map is a diffeomorphism between dual spaces, enabling dual coordinates. "Then FF is a Legendre-type convex function"
  • Location–scale model: A family of distributions obtained by translating and scaling a base density, parameterized by location and scale. "We consider the multi-dimensional location-scale model"
  • Maximal invariant: A function of data/parameters that captures all information invariant under a group action; two inputs map to the same value iff they lie in the same orbit. "the map mm is a maximal invariant."
  • Multiplier (on a group action): A function χ mapping group elements to positive reals that adjusts a measure under the group action to ensure relative invariance. "Assume that there is a multiplier χ\chi on GG"
  • Orbit (of a group action): The set of points reachable by acting on a given element with all group elements. "Let the orbit of (θ1,θ2)Θ×Θ(\theta_1, \theta_2) \in \Theta \times \Theta be O(θ1,θ2){g(θ1,θ2)gG}O_{(\theta_1, \theta_2)} \coloneqq \{g(\theta_1, \theta_2) | g \in G\}."
  • Orthogonal group O(d,ℝ): The group of all d×dd\times d real matrices with QQ=IQ^\top Q=I, representing rotations and reflections. "there exists PO(d,R)P \in O(d, \mathbb{R}) such that V1=V2PV_1 = V_2 P."
  • Pull-back (of a metric): The induced metric on a domain via a smooth map into a Riemannian manifold, defined by pulling back inner products. "Denote the pull-back of the metric g(μ,Σ)g_{(\mu,\Sigma)} with respect to τ\tau by gˉ(μ,V)\bar{g}_{(\mu,V)}."
  • Quotient space: A space formed by identifying points according to an equivalence relation, often arising from a group action. "We denote the quotient space with respect to this relation by GL(d,R)/O(d,R)GL(d, \mathbb{R}) / O(d,\mathbb{R})"
  • Riemann–Lebesgue lemma: A Fourier analysis result stating that the Fourier transform of an L1L^1 function vanishes at infinity. "By the Riemann--Lebesgue lemma, limξφ0,Id(ξ)=0\displaystyle \lim_{\|\xi\| \to \infty} \varphi_{0, I_d} (\xi) = 0."
  • Riemannian symmetric space: A homogeneous Riemannian manifold with symmetries about every point, often expressed as a quotient like G/KG/K. "the Riemannian symmetric space SL(d+1,R)/SO(d+1,R)SL(d+1,\mathbb{R})/SO(d+1,\mathbb{R})."
  • σ-finite measure space: A measure space decomposable into countably many sets of finite measure, enabling many measure-theoretic constructions. "Let (X,B,λ)(X, \mathcal{B}, \lambda) be a σ\sigma-finite measure space."
  • Singular value decomposition (SVD): A matrix factorization S=UΣWS=U\Sigma W^\top with orthogonal U,WU,W and diagonal Σ\Sigma of nonnegative singular values, used to canonicalize relative scales. "Take a singular value decomposition S=UΣWS = U\Sigma W^\top"
  • Special linear group SL(d+1,ℝ): The group of (d+1)×(d+1)(d+1)\times(d+1) real matrices with determinant 1. "j(A,b)(detA)1/(d+1)(Ab 01)SL(d+1,R).j(A,b) \coloneqq (\det A)^{-1/(d+1)} \begin{pmatrix} A & b \ 0 & 1 \end{pmatrix} \in SL(d+1,\mathbb{R})."
  • Spectral divergence: A divergence that depends only on the spectrum (eigenvalues) of a matrix derived from two parameters. "Such a divergence is called a {\it spectral divergence}."
  • Stabilizer: The subgroup of group elements that fix a given point under the action. "Let HH be the stabilizer of θ0Θ\theta_0 \in \Theta."
  • Symmetrization (of a matrix): The operation sym(A)=(A+A)/2sym(A)=(A+A^\top)/2 extracting the symmetric part of a square matrix. "Denote the {\it symmetrization} of a square matrix AA by sym(A)sym(A)"
  • Topological group: A group equipped with a topology where the group operations are continuous. "Let GG be a topological group."
  • Transitive action: A group action with a single orbit; any point can be moved to any other by some group element. "Assume that GG acts on Θ\Theta transitively."
  • Transformation model: A statistical model equipped with a group action on data and parameters, with densities transforming in a specified way (via a multiplier), inducing invariances. "[Group invariance in a transformation model]"

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 82 likes about this paper.