Group invariance of $f$-divergences and the Fisher--Rao distance
Abstract: Many statistical models have natural symmetries described by a group action. We study how such symmetries affect the comparison of two distributions. We work with a transformation model in which a group acts on both the sample space and the parameter space, and the densities transform with a multiplier. Under this assumption, we show that every $f$-divergence is invariant under the group action. As a consequence, an invariant divergence depends only on a maximal invariant of the pair of parameters. When the action on the parameter space is transitive, this maximal invariant is given by a double coset. We apply this result to multidimensional location-scale families, and we show that the same reduction applies to the Fisher--Rao distance.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Overview: What is this paper about?
This paper studies symmetry in statistics and how it helps compare two probability distributions. It shows that many popular “difference measures” between distributions (called f-divergences, like Kullback–Leibler or Hellinger) and a geometric distance (the Fisher–Rao distance) do not change when you move or rescale both distributions in the same way. Because of this invariance, the paper explains how to rewrite these comparisons using a smaller set of “essential” variables that capture only the relative position and size between the two distributions.
Goals and questions
The authors set out to answer a few simple-sounding questions:
- If a statistical model has natural symmetries (like shifting or scaling), do standard measures of difference between two distributions stay the same when we apply those symmetries to both distributions?
- What information about the two distributions really matters once we ignore those symmetries?
- Can we describe that leftover, essential information in a clean, general way?
- How does this work for common models built from shifting and scaling a base shape (location–scale families)?
- Does the same idea apply to the Fisher–Rao distance, a geometric way to measure how far two distributions are on a curved “shape” of all distributions?
Methods: How did they study it?
Think of a “group action” as a set of moves you can make—like shifting all data by the same amount, or stretching it in every direction—that follow consistent rules. A “statistical model” gives you a family of distributions (recipes for how likely outcomes are) that can be changed by these moves.
- Transformation model: The group acts both on the data (sample space) and on the parameters (like mean and scale). The probability density transforms in a way that accounts for stretching or compressing (using a “multiplier” that adjusts volume).
- Invariance: An f-divergence compares two distributions. The key trick is: if you apply the same group move to both distributions, the integral formula for the divergence stays the same (invariance).
- Maximal invariant: Imagine collecting all versions of a parameter pair you get by applying all allowed moves. That collection is an “orbit.” A maximal invariant is a way to label or pick the essential information for an orbit—what remains after you ignore all symmetric moves. Once you know a maximal invariant, any invariant divergence can be written as a function of it.
- Double coset (transitive case): When the group acts richly enough on the parameter space (transitive action), the essential relative position of two parameters is captured by an object called a “double coset.” You can think of it as a standardized label of how one parameter sits relative to the other after you remove the freedom to move them around.
- Location–scale families: These are models where you take a base distribution and get new ones by shifting (location) and stretching (scale). In multiple dimensions, stretching can happen differently in different directions. The authors use a careful “quotient” description to ignore irrelevant rotations of the scale and then use a singular value decomposition (SVD) to break the relative scale into simple pieces (the singular values). They also split the relative location into blocks aligned with those stretches and summarize each block by its size (its norm).
- Fisher–Rao distance: This is a distance defined by the Fisher information geometry. Using the same symmetry ideas (affine invariance), you can reduce the problem of finding the distance between two distributions to finding the distance between a transformed “canonical” pair, which is simpler to study.
Main findings and why they matter
Here are the key results, explained simply:
- All f-divergences are invariant under the group action when the densities transform appropriately. If you move or scale both distributions in the same way, their f-divergence doesn’t change. This includes many popular measures like KL, Hellinger, total variation, and chi-square.
- Any invariant divergence depends only on the maximal invariant of the pair of parameters. In the transitive case, that maximal invariant can be written as a double coset, which labels the pair’s relative position after removing symmetry.
- For multidimensional location–scale families:
- The essential relative scale is captured by the singular values of the matrix that maps one scale to the other (these are the amounts of stretching in each principal direction).
- The essential relative location is captured by “block norms”—the sizes of parts of the relative shift aligned with groups of equal singular values. Together, these singular values and block norms are all that any invariant f-divergence can depend on.
- The same reduction applies to the Fisher–Rao distance. Because of affine invariance, you can transform any pair of distributions to a canonical form (one is the base reference), and the distance depends only on the same kinds of invariant variables (relative scale singular values and block norms of the transformed relative location).
- For certain dually flat divergences (like Bregman or Fenchel–Young), the paper shows a matching invariance under dual group actions. In scale-only models, these divergences depend only on the eigenvalues of the relative scale (they are “spectral” divergences), reinforcing the same symmetry story.
Why this matters:
- It simplifies formulas and calculations. Instead of handling all parameters directly, you reduce to a small set of invariant quantities.
- It explains why many different divergence formulas end up depending on the same “spectral” or relative features.
- It improves understanding of what truly affects a comparison between two distributions, and what is just a matter of coordinate choice (like a common shift or rotation).
Implications: What is the potential impact?
- Cleaner, more general theory: The results unify many divergence formulas under the same symmetry principle, showing they all depend on the same essential “relative” data.
- Practical simplification: In applications (statistics, machine learning, signal processing), you can transform problems into canonical forms and compute divergences or distances using only the invariant variables (singular values and block norms), saving effort and avoiding clutter from irrelevant coordinates.
- Geometric insight: For the Fisher–Rao distance and related geometries, the paper clarifies how symmetry reduces the problem and highlights what variables determine the distance.
- Foundation for further work: Even when closed-form answers are hard, knowing the exact invariant variables guides numerical methods, bounds, and approximations, and helps design invariant algorithms and tests.
In short, the paper shows that symmetry is not just a mathematical nicety—it is a powerful tool that strips away distractions, leaving only the core, relative features that truly matter when comparing two statistical distributions.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a single, actionable list of what remains missing, uncertain, or unexplored in the paper.
- Characterize precisely when the transformation model assumption holds: give necessary and sufficient conditions on the base density and measure for equations like and to be satisfied beyond the radial/affine cases considered.
- Extend the invariance results to non-radial base densities: determine which classes of anisotropic/elliptical (or more general) still yield invariance under appropriate group actions, and how the maximal invariants change.
- Identify minimal assumptions on for identifiability in the quotient parameterization: provide necessary and sufficient conditions (beyond positivity, radiality, and finite second moment) under which is identifiable; analyze heavy-tailed cases (e.g., Student–, Cauchy) where moments may not exist.
- Provide explicit, closed-form expressions for commonly used -divergences as functions of the proposed invariants: for concrete models (Gaussian, Laplace, Student–), derive (e.g., KL, Hellinger, , total variation) in terms of singular values of and the block norms from .
- Develop constructive, canonical selections of maximal invariants for general homogeneous spaces: beyond location–scale, give algorithmic procedures (not relying on the axiom of choice) to compute maximal-invariant representatives of double cosets and analyze their measurability/continuity.
- Analyze the topology/geometry of the double-coset space in general: establish when it admits a smooth/stratified manifold structure, how singularities arise (e.g., due to eigenvalue multiplicities), and how this affects continuity of invariant formulas.
- Quantify stability of the invariant parametrization under eigenvalue multiplicity changes: study continuity and differentiability of as singular values coalesce and block structures change, and design robust numerical schemes that avoid instability near multiplicity transitions.
- Address models with parameter-dependent support or zeros in the density: determine how invariance and -divergence definitions extend when vanishes on sets of positive measure or has support varying with .
- Extend the reduction principle to non-transitive parameter actions: provide explicit maximal invariants for where the -action is not transitive, and characterize as functions of these invariants for such models.
- Generalize beyond -divergences and Fisher–Rao: identify other divergences/distances (e.g., -divergences, Wasserstein distances) that exhibit analogous group invariance and reduction to maximal invariants; specify necessary conditions on the divergence and group.
- Classify congruence-invariant Bregman/Fenchel–Young divergences on : beyond the log-determinant potential, characterize all convex generators whose Bregman/FY divergences are invariant under (or the dual action), and relate them to spectral divergences.
- Bridge the -divergence and dually flat perspectives: identify when -divergences on coincide with (or can be represented by) congruence-invariant Bregman divergences, and determine the classes of for which this is possible.
- Fisher–Rao distance: obtain explicit formulas or efficient algorithms in the reduced coordinates for multivariate normals in ; quantify approximation errors for existing surrogates and exploit the invariance to design practical solvers for the two-point boundary-value problem.
- Analyze geometric properties of the quotient geometry: study curvature, geodesic completeness, and convexity of geodesic balls for the pullback metric on , and relate them to properties of and .
- Compare and relate the affine-invariant Fisher–Rao geometry to the symmetric-space construction of Lovrić–Min-Oo–Ruh: provide explicit transformations between distances, quantify differences, and identify scenarios where one geometry yields advantages (closed forms, numerical stability).
- Develop statistical procedures leveraging the invariant reduction: construct invariant tests, confidence sets, or estimators that depend only on the maximal invariants; analyze their optimality (minimax/invariance) and sampling distributions.
- Consider approximate invariance and robustness: define and analyze divergences under approximate group symmetry, providing perturbation bounds for when the model only nearly satisfies the transformation model.
- Extend to discrete sample spaces and other measures: verify or adapt the multiplier/quasi-invariance framework for counting or mixed measures and characterize the corresponding maximal invariants.
- High-dimensional computation: design numerically stable and scalable methods to compute singular values and block norms, including randomized SVD variants and error bounds, to make the invariant reduction practical for large .
- Random matrix and probabilistic analysis of invariants: derive the distributional behavior of the singular values and block norms under natural priors or sampling models, enabling Bayesian or frequentist inference on divergences through invariant coordinates.
- Infinite-dimensional extensions: explore whether analogous invariance and maximal-invariant reductions hold for functional data or Gaussian processes under diffeomorphism or unitary actions, and identify analytical obstacles (e.g., non-separable spaces, lack of Haar measure).
- Alternative canonical parameterizations: investigate whether parameterizations other than (e.g., via Cholesky or polar decompositions with constraints) yield simpler or more stable maximal invariants and formulas for and .
Practical Applications
Practical Applications Derived from the Paper’s Findings
This paper shows that many statistical dissimilarities (f-divergences, Fisher–Rao distance, and certain Bregman/Fenchel–Young divergences) are invariant under natural group actions and, crucially, depend only on maximal invariants of parameter pairs. In location–scale models, any invariant divergence between two distributions reduces to functions of (i) the singular values of the relative scale S = V₂⁻¹V₁ and (ii) block norms of a transformed relative location ν = V₂⁻¹(μ₁−μ₂). This reduction yields canonical, symmetry-respecting signatures for comparing distributions, enabling simpler, more robust workflows across sectors.
Immediate Applications
The following can be implemented now with standard linear algebra (SVD/eigendecomposition), existing statistical libraries, and routine numerical optimization.
- Affine-invariant drift detection in sensor networks and IoT (Industry: robotics, energy, manufacturing)
- Use case: Detect distribution shifts in streaming sensor data that are due to real changes, not to re-calibration, unit conversions, or frame changes.
- Workflow: Estimate (μ, V) for sliding windows; compute S = V₂⁻¹V₁, ν = V₂⁻¹(μ₁−μ₂); compute invariant signatures (singular values of S and block norms of Uᵀν); evaluate an f-divergence expressed as a function of these invariants; alert if threshold crossed.
- Tools/products: An “Invariant Shift Monitor” module for edge gateways; integration with Kafka/Flink pipelines.
- Assumptions/dependencies: Transformation model reflects domain symmetries (location/scale changes, linear transforms); windows large enough for stable estimates; V₂ invertible and SPD where needed.
- Robust A/B testing and online experimentation under re-scaling/re-basing (Industry: software, e-commerce, ad-tech)
- Use case: Compare treatment/control distributions when metrics are re-scaled (e.g., currency inflation, unit changes) or re-based (e.g., baseline offsets).
- Workflow: Map both variants to canonical form using the m-map (V₂⁻¹(·)); compute invariant divergences (e.g., KL, Hellinger) as functions of SVD invariants; report effects unaffected by nuisance transformations.
- Tools/products: Add-on to experimentation platforms (e.g., LaunchDarkly/Optimizely) providing “affine-invariant effect size.”
- Assumptions: Metric distributions well-modeled by location–scale families (or reasonable approximations).
- Covariance monitoring via spectral divergences (Finance, Healthcare, Cybersecurity)
- Use case: Track changes in risk (finance), physiology (healthcare), or threat profiles (security) represented by SPD covariance matrices.
- Workflow: Compare Σ₁ and Σ₂ using spectral divergences (functions of eigenvalues of Σ₂⁻¹Σ₁); optionally use the Bregman divergence with log-det potential for congruence invariance.
- Tools/products: “Spectral Divergence Dashboard” for risk/physiology monitoring; Python/R package for SPD spectral distances.
- Assumptions: SPD matrices well-estimated; congruence action models the domain (e.g., linear mixing/rotations).
- Domain shift detection in vision and audio with illumination/contrast/volume invariance (Industry: computer vision, speech)
- Use case: Compare feature distributions across conditions (lighting, contrast, gain) without confounding nuisances.
- Workflow: Fit location–scale models to features; canonicalize pairs using V₂⁻¹; compute invariant divergences on SVD invariants; trigger retraining or adaptation.
- Tools/products: Plug‑in for MLOps monitoring (e.g., WhyLabs/Fiddler) that reports invariant drifts.
- Assumptions: Feature distributions approximately elliptical/radial; affine approximations sufficient.
- Medical imaging harmonization across scanners (Healthcare)
- Use case: Compare or harmonize MRI/CT intensity distributions across vendors/sites without bias from scanner scaling or baselines.
- Workflow: Estimate (μ, V) from regions of interest; reduce to S and ν; use invariant f-divergence or Fisher–Rao reduction to quantify harmonization gaps.
- Tools/products: PACS-compatible QC module for cross-site harmonization audits.
- Assumptions: Approximate location–scale behavior after standard pre-processing; identifiability ensured via quotient GL/O.
- Affine-invariant two-sample and goodness-of-fit tests (Academia, Industry)
- Use case: Hypothesis testing where nuisance transformations (translation, scaling, congruence) should be ignored.
- Workflow: Construct tests using maximal invariants (double coset or SVD-based invariants); critical values via permutation/bootstrapping on invariant statistics.
- Tools/products: Statistical libraries offering “group-invariant tests” for common families (Gaussian, elliptical).
- Assumptions: Group action well-specified and transitive on parameter space; finite-moment and integrability conditions satisfied.
- Canonical pair reduction layers for ML pipelines (Software/ML)
- Use case: Stabilize loss/metrics by removing nuisance parameters before comparing distributions.
- Workflow: Implement a “Canonical Pair Reducer” that maps (μ₁,V₁),(μ₂,V₂) to (Σ, r)-type invariants (singular values, block norms); feed to divergence modules.
- Tools/products: PyTorch/TensorFlow layers for invariant comparisons; scikit-learn transformers for SPD and location–scale data.
- Assumptions: Data fits transformation model; numerical stability of SVD in high dimensions.
- Fair and comparable cross-country/agency metrics (Policy, Public sector)
- Use case: Compare distributions (e.g., incomes, emissions) across units with different scales and baselines.
- Workflow: Apply invariant divergences to summarize differences ignoring units and baselines, focusing on shape/relative spread.
- Tools/products: Statistical policy dashboards reporting affine-invariant divergence scores.
- Assumptions: Appropriate pre-normalization; interpretations validated with domain experts.
- Device calibration and QA in manufacturing and daily life (Industry, Consumer electronics)
- Use case: Verify that two devices (scales, thermometers, microphones) produce statistically similar outputs up to affine calibration.
- Workflow: Collect paired measurements, estimate (μ,V), compute invariant divergence; accept/reject based on thresholds.
- Tools/products: Embedded QC apps; smartphone apps for sensor sanity checks.
- Assumptions: Sufficient sample size; affine model reasonable for device’s operating range.
- Privacy-aware change detection via invariant signatures (Cross-sector)
- Use case: Share only invariant summaries (singular values, block norms) for inter-org monitoring to reduce exposure of raw parameters.
- Workflow: Compute and transmit invariant signatures; central service computes divergence and alerts.
- Tools/products: APIs for invariant signature exchange.
- Assumptions: Invariants retain enough sensitivity for the task; privacy risk assessment needed.
Long-Term Applications
These require additional research, scaling, or domain adaptation (e.g., beyond location–scale/elliptical models, improved geodesic solvers, automated symmetry discovery).
- Automated symmetry discovery and invariant divergence selection (Academia, Software)
- Goal: Learn group actions from data and select divergences invariant to discovered symmetries; automate construction of maximal invariants beyond double cosets for complex models.
- Dependencies: Advances in representation learning and causal discovery; theoretical guarantees for learned symmetries.
- Closed-form or fast approximations for Fisher–Rao distances in multivariate normals and beyond (Academia, Industry)
- Goal: Practical, scalable approximations of Fisher–Rao distances using the paper’s reduction to canonical forms; GPU-accelerated geodesic solvers.
- Dependencies: Numerical geometry on SPD manifolds; efficient solvers for two-point boundary-value problems.
- Invariant metrics for evaluation of deep generative models (Industry, Academia)
- Goal: Replace ad hoc metrics with group-invariant divergences measuring distributional similarity while factoring out nuisance transformations (e.g., contrast/scale in images).
- Dependencies: Extensions to nonparametric or implicit models; robust estimation of invariants from deep features.
- Federated and privacy-preserving monitoring using invariant signatures (Cross-sector)
- Goal: Share only invariant summaries to monitor population changes across institutions without revealing raw parameters or identities.
- Dependencies: Differential privacy for invariant statistics; secure aggregation protocols.
- Robustness certification under affine nuisances (Software, Safety-critical systems)
- Goal: Certify ML systems for invariance to specified group actions by bounding invariant divergences under perturbations.
- Dependencies: Verified SVD/eigensolvers; formal methods for group actions in ML.
- Invariant control and planning in robotics (Robotics)
- Goal: Use invariant divergences to compare sensor/feature distributions across poses and loads for adaptive control and transfer.
- Dependencies: Extension to SE(3) and non-linear group actions; real-time invariant computation on embedded hardware.
- Regulatory standards for comparable statistics across jurisdictions (Policy)
- Goal: Define standardized, affine-invariant divergence benchmarks for cross-country metrics (e.g., inflation-adjusted distributions).
- Dependencies: Consensus on modeling assumptions; stakeholder alignment.
- Streaming, high-dimensional invariant analytics (Industry)
- Goal: Real-time computation of SVD/eigen spectra for thousands of sensors/features, with adaptive rank and sketching.
- Dependencies: Randomized linear algebra, sketching, and hardware acceleration.
- Extending invariant reduction to non-Euclidean sample spaces and complex groups (Academia)
- Goal: Maximal invariants and divergence invariance under diffeomorphism groups, Lie groups on manifolds, and beyond.
- Dependencies: Differential geometry, harmonic analysis on groups, scalable algorithms.
- Invariant hypothesis testing and confidence regions with nuisance parameters (Academia, Healthcare/Clinical trials)
- Goal: Develop tests and intervals that depend only on maximal invariants, improving power and interpretability under nuisance transformations.
- Dependencies: Asymptotic theory and finite-sample corrections; regulatory validation.
Notes on feasibility and assumptions that cut across applications:
- Transformation model validity: The underlying data should admit the assumed group action (e.g., location–scale, congruence for SPD). For the location–scale constructions, base densities are assumed radial/elliptical and strictly positive, with integrability (e.g., finite second moment).
- Identifiability and quotienting: In higher dimensions, identifiability is enforced via quotienting by O(d); implementations must map to GL(d)/O(d) or use spectral decompositions.
- Numerical considerations: Stable SVD/eigen computations are required; care is needed in high dimensions or ill-conditioned cases (regularization/sketching may be necessary).
- Model mismatch: If distributions deviate strongly from the assumed family, invariant divergences may misrepresent differences; diagnostic checks are recommended.
- Computational cost: Fisher–Rao distances often require numeric geodesics (no general closed form for multivariate normals at d>1); approximations or spectral alternatives may be preferable in production.
Glossary
- Affine group: The group of all invertible affine transformations on ℝd, typically written Aff(d), combining linear maps and translations. "Denote the action of the affine group on by"
- Affine invariance: A property of a quantity (e.g., a distance) that remains unchanged under affine transformations. "affine invariance reduces the distance between two distributions to the distance from a canonical base distribution (Proposition~\ref{prop:FR-quotient})."
- Axiom of choice: A set-theoretic principle guaranteeing the existence of selections from arbitrary collections of nonempty sets; often invoked to ensure existence without explicit construction. "We remark that, in general, such a self-map exists due to {\it the axiom of choice}."
- Borel measurable function: A function measurable with respect to the Borel σ-algebra, ensuring integrability/measure-theoretic properties on ℝd. "for some positive Borel measurable function on ."
- Bregman divergence: A class of divergences generated from a convex function, measuring discrepancy via the function’s linearization; central in information geometry and convex analysis. "the associated Bregman divergence also satisfies"
- Centered matrix scale family: A statistical family where distributions differ only by a matrix “scale” around a centered (zero-mean) base distribution. "We consider a centered matrix scale family and a dually flat divergence"
- Characteristic function: The Fourier transform of a probability distribution, uniquely describing it and useful for identifiability arguments. "the characteristic function of the density "
- Chi-square divergence: An f-divergence measuring discrepancy between probability distributions via a squared density ratio; also known as Pearson’s χ² divergence. "the total variation distance and the chi-square divergence."
- Congruence action: The transformation of a matrix by , used to express how positive definite matrices transform under linear changes of coordinates. "congruence actions on positive definite matrices."
- Convex conjugate: The Legendre–Fenchel transform of a convex function , mapping to the dual via supremum over affine minorants. "The convex conjugate of is"
- Diagonal action: A simultaneous group action on a product space, acting identically on each component (e.g., ). "the diagonal action on the product space is usually not transitive."
- Diffeomorphism: A smooth bijection with a smooth inverse between manifolds, preserving differentiable structure. "The map is a diffeomorphism between and ."
- Double coset: An equivalence class in a group relative to a subgroup , capturing relative positions under left and right translations by . "Define the {\it double coset space} by "
- Dually flat divergence: A divergence compatible with a pair of dual affine coordinate systems induced by a convex potential (e.g., Bregman/Fenchel–Young), giving a flat information-geometric structure. "a dually flat divergence~\cite{ohara1996dualistic,IG-2016}"
- Dually flat structure: A geometric structure on a manifold with dual, flat affine connections induced by a convex potential. "induces a dually flat structure"
- Exponential family: A class of distributions with densities of the form ; KL divergence becomes a Bregman divergence in natural parameters. "When the scale family is an exponential family and its Kullback--Leibler divergence is represented by this Bregman divergence, the two viewpoints coincide."
- f-divergence: A broad class of divergences defined via a convex function of the likelihood ratio, encompassing KL, Hellinger, χ², and others. "every -divergence is invariant under the group action."
- Fenchel–Young divergence: A divergence defined as from a convex function and its conjugate , generalizing Bregman divergences. "the Fenchel--Young divergence~\cite{Acharyya2013LearningToRank}"
- Fisher metric: The Riemannian metric on a statistical manifold defined by the covariance of score functions (derivatives of log-density). "the Fisher metric, specifically, "
- Fisher–Rao distance: The geodesic distance induced by the Fisher metric on a statistical manifold, measuring intrinsic separation between distributions. "The Fisher--Rao distance is the Riemannian (geodesic) distance of the Fisher metric."
- Geodesic distance: The length of the shortest path between two points on a Riemannian manifold, computed using the manifold’s metric. "The Fisher--Rao distance is the Riemannian (geodesic) distance of the Fisher metric."
- General linear group: The group of all invertible real matrices, denoted . "Let be the general linear group on ."
- Homogeneous space: A space on which a group acts transitively, representable as a quotient by a stabilizer subgroup . "The parameter space is identified with a homogeneous space "
- Immersion: A smooth map whose differential is injective at each point; failure to be an immersion means directions are collapsed. "Therefore, this map is not an immersion if ."
- Invariant divergence: A divergence that remains unchanged under a specified group action consistent with the model’s symmetries. "an invariant divergence depends only on a maximal invariant of the pair of parameters."
- Kullback–Leibler divergence: An asymmetric f-divergence measuring relative entropy between two distributions. "the Kullback--Leibler divergence"
- Lebesgue measure: The standard translation-invariant measure on ℝd used for integration in Euclidean spaces. "where denotes the Lebesgue measure."
- Legendre-type convex function: A strictly convex, essentially smooth function whose gradient map is a diffeomorphism between dual spaces, enabling dual coordinates. "Then is a Legendre-type convex function"
- Location–scale model: A family of distributions obtained by translating and scaling a base density, parameterized by location and scale. "We consider the multi-dimensional location-scale model"
- Maximal invariant: A function of data/parameters that captures all information invariant under a group action; two inputs map to the same value iff they lie in the same orbit. "the map is a maximal invariant."
- Multiplier (on a group action): A function χ mapping group elements to positive reals that adjusts a measure under the group action to ensure relative invariance. "Assume that there is a multiplier on "
- Orbit (of a group action): The set of points reachable by acting on a given element with all group elements. "Let the orbit of be ."
- Orthogonal group O(d,ℝ): The group of all real matrices with , representing rotations and reflections. "there exists such that ."
- Pull-back (of a metric): The induced metric on a domain via a smooth map into a Riemannian manifold, defined by pulling back inner products. "Denote the pull-back of the metric with respect to by ."
- Quotient space: A space formed by identifying points according to an equivalence relation, often arising from a group action. "We denote the quotient space with respect to this relation by "
- Riemann–Lebesgue lemma: A Fourier analysis result stating that the Fourier transform of an function vanishes at infinity. "By the Riemann--Lebesgue lemma, ."
- Riemannian symmetric space: A homogeneous Riemannian manifold with symmetries about every point, often expressed as a quotient like . "the Riemannian symmetric space ."
- σ-finite measure space: A measure space decomposable into countably many sets of finite measure, enabling many measure-theoretic constructions. "Let be a -finite measure space."
- Singular value decomposition (SVD): A matrix factorization with orthogonal and diagonal of nonnegative singular values, used to canonicalize relative scales. "Take a singular value decomposition "
- Special linear group SL(d+1,ℝ): The group of real matrices with determinant 1. ""
- Spectral divergence: A divergence that depends only on the spectrum (eigenvalues) of a matrix derived from two parameters. "Such a divergence is called a {\it spectral divergence}."
- Stabilizer: The subgroup of group elements that fix a given point under the action. "Let be the stabilizer of ."
- Symmetrization (of a matrix): The operation extracting the symmetric part of a square matrix. "Denote the {\it symmetrization} of a square matrix by "
- Topological group: A group equipped with a topology where the group operations are continuous. "Let be a topological group."
- Transitive action: A group action with a single orbit; any point can be moved to any other by some group element. "Assume that acts on transitively."
- Transformation model: A statistical model equipped with a group action on data and parameters, with densities transforming in a specified way (via a multiplier), inducing invariances. "[Group invariance in a transformation model]"
Collections
Sign up for free to add this paper to one or more collections.