Divergence Transformations: Theory & Applications

Updated 16 December 2025

Divergence Transformations are systematic operations that deform probability densities and quantum states by modulating divergence measures, offering flexible interpolation.
They enable controlled adjustment of divergence metrics like Rényi and KL divergence, exhibiting an Abelian group structure and monotonic behavior critical for robust inference.
These transformations preserve key divergence properties in quantum and tensor settings, paving the way for scalable applications in adversarial defense, inverse problems, and model selection.

Divergence Transformations are structured mathematical operations that systematically deform or interpolate probability densities, quantum states, or vector/tensor fields in such a way as to alter divergence-based measures of distinguishability. These transformations are central to a broad range of fields, including information theory, quantum information, statistical inference, machine learning, adversarial robustness, and mathematical physics. The unifying theme is the use of divergence functionals—as generalized “distances”—and systematic transformations either to preserve, maximize, minimize, or interpolate such divergences, often leading to new invariances, group structures, monotonicities, or enhanced flexibility in modeling.

1. Algebraic Divergence Transformations and Their Group Structure

A canonical framework is provided by the recent “divergence transformations” $T_\alpha$ introduced for probability densities $p(x),q(x)$ sharing common support $\Omega\subset\mathbb{R}$ (Iagar et al., 12 Dec 2025). For a fixed $\alpha\in\mathbb{R}$ with normalization constant $K_\alpha[q\|p] = \int_\Omega p(x)^{1-\alpha}q(x)^\alpha dx$ , and cumulative transforms

$H(y) = \int_{x_i}^y q(s)\,ds, \quad K(1-\alpha;x) = \int_{x_i}^x p(t)^{1-\alpha} q(t)^\alpha dt,$

one defines the divergence transformation as

$p_\alpha(y) = T_\alpha(p\|q)(y) = K_\alpha[q\|p]\, \left(\frac{p(x(y))}{q(x(y))}\right)^\alpha\, q(y),$

where $x(y)$ is determined by $H(y) = K(1-\alpha; x)/K_\alpha[q\|p]$ .

Key properties and group structure:

$T_0(p\|q) = q$ , $T_1(p\|q) = p$ .
The set $\{T_\alpha\}_{\alpha\neq0}$ forms an Abelian group under composition: $T_\beta(T_\alpha(p\|q)) = T_{\beta\alpha}(p\|q)$ , with identity $T_1$ and inverse $T_{1/\alpha}$ .
The action deforms $p$ toward (or away from) $q$ along a “divergence geodesic,” controlling monotonic evolution of various divergences.

This algebraic framework enables both smooth interpolation and extreme deformation, with $\alpha\to0$ collapsing $p_\alpha$ onto $q$ , and $\alpha\to\alpha_c^+$ driving the divergence $D_\xi[T_\alpha(p\|q)\|q]$ to $+\infty$ . The algebraic structure is realized via conjugation with differential-escort and relative escort maps.

2. Monotonicity and Parametric Control of Divergences

Divergence transformations induce monotone evolution of key divergence measures—Kullback-Leibler (KL), Rényi, and composite complexity functionals. For the Rényi divergence $D_\xi$ , a fundamental identity (Iagar et al., 12 Dec 2025) holds: $D_\xi[p_\alpha\|q] = \alpha\, D_{\xi_\alpha}[p\|q] + (\alpha-1)\, D_\alpha[q\|p],$ with $\xi_\alpha = 1+\alpha(\xi-1)$ . For $\xi\ge1$ , the map $\alpha\mapsto D_\xi[p_\alpha\|q]$ is convex with a unique minimum at $\alpha=0$ , implying strict monotonicity: divergence increases with $\alpha>0$ and decreases with $\alpha<0$ . Thus, for any $\xi\ge1$ :

$0<\alpha<1$ : $D_\xi[p_\alpha\|q]<D_\xi[p\|q]$ (divergence-decreasing interpolation)
$\alpha>1$ : $D_\xi[p_\alpha\|q]>D_\xi[p\|q]$ (divergence-increasing deformation)

This provides fine-grained parametric control, enabling both efficient smoothing (for inference or approximation) and controlled exaggeration (for robustness or discrimination).

3. Transformations Preserving or Maximizing Divergence

A major theme is characterizing transformations that preserve specific divergence functionals:

Quantum/Matrix settings: Classification theorems show that any (bijective or surjective) transformation preserving Bregman, Jensen, Rényi, or Umegaki divergences on the space of quantum states (density operators), positive-definite matrices, or cones, must be a unitary or antiunitary congruence (sometimes up to scaling) (Gaál et al., 2015, Virosztek, 2016, Molnár et al., 2015). No nontrivial noncongruence maps exist except for specific divergences like Stein's loss, where all invertible congruences are allowed.
Distributional transformations: Spread Divergence (Zhang et al., 2018) systematically “spreads” both $P$ and $Q$ via convolution with a positive kernel, creating a full-support extension of standard $f$ -divergences applicable to distributions with singular components or disjoint supports. Parameters of the spread can be learned to maximize discrimination.

An essential implication is that, under the corresponding regularity and functional conditions, divergence-preserving transformations encode the full symmetry group of the divergence, making them the natural morphisms for statistical or quantum information geometry.

4. Application Domains: From Inverse Problems to Adversarial Robustness

Divergence transformations underpin advanced methodologies across domains:

Statistical Modeling, NMF, Inverse Problems: Scale-invariant and affine-invariant divergences extend standard discrepancy measures to non-normalized data (Lantéri, 2020). Transforming $q$ by a positive scaling factor (solved via a critical point equation) renders the divergence independent of overall flux, enabling constrained minimization consistent with physical or probabilistic constraints.
Direct Preference Optimization (DPO) in LLMs: DPO-Kernels (Das et al., 5 Jan 2025) replaces the standard KL regularization with a varied portfolio: Jensen-Shannon, Hellinger, Rényi, Bhattacharyya, Wasserstein, and $f$ -divergences, together with kernel-induced feature transformations. These divergence-rich frameworks enable both more nuanced alignment and robust generalization via parameterizable, data-driven choices and hierarchical mixtures.
Adversarial Defense: DRIFT (Guesmi et al., 29 Sep 2025) explicitly trains stochastic ensembles of random filters to maximize “gradient divergence” (or minimize consensus). The architecture combines loss components controlling logit-space and Jacobian-space divergence. This reduces adversarial transferability and improves robust accuracy in neural networks.
Implicit Generative Models: Spread Divergence rescues maximum-likelihood training and latent-variable inference in settings where classical divergences are undefined (e.g., when the model’s support is not the full ambient space) (Zhang et al., 2018). Both EM and variational techniques leverage spreaded divergences as robust, consistent objectives.

5. Structural Invariance and Covariance Under Transformations

Divergence-based information measures inherit powerful invariance properties under structured transformations:

Quantum and operator-theoretic contexts: Any divergence satisfying the data-processing inequality is invariant under local isometric or unitary transformations (Popp et al., 4 Sep 2025). This guarantees that derived quantities (mutual information, conditional entropy, min- and max-information measures) are stable under changes of basis, dimension reduction, or local rotations. This structural invariance is foundational for protocol optimization and resource conversion in quantum information.
Mathematical physics and tensor analysis: Divergence-free symmetric tensors in continuum mechanics are strictly invariant under projective transformations (elements of $PGL(d+1, \mathbb{R})$ ), preserving conservation laws and enabling new dispersive bounds through geometrically informed changes of variables (Serre, 2021).
Lorentz transformations and relativistic signal processing: KL divergence and Fisher information exhibit critical divergence under Lorentz boosts, with clear order parameters and phase-transition analogies. The divergence transformation encapsulates the behavior of information-theoretic quantities under fundamental spacetime symmetries (Tsuruyama, 3 Jul 2025).

6. Parameter-Driven and Data-Driven Divergence Learning

Expanding the utility of divergence transformations, automatic selection frameworks (Dikmen et al., 2014) allow choosing the best divergence (and its parameter) for a task by recasting the selection as a maximum-likelihood problem. By algebraic reductions, $\alpha$ -, $\gamma$ -, and Rényi-divergences are mapped into $\beta$ - or $\alpha$ -divergence forms, supporting unified model selection via specialized surrogates and Laplace approximations. DPO-Kernels further extend this with feature-kernel selection and divergence metrics based on support overlap, drift, kurtosis, and alignment tightness (Das et al., 5 Jan 2025).

Transformation Type	Main Domain	Conditions for Invariance/Structure
Divergence $T_\alpha$	Probability densities	Group structure, monotonicity (α param)
Spread Divergence	Implicit generative models	Full-support kernels, learnable parameters
Congruence maps	Quantum states, PSD cones	Preserves Bregman/Jensen/Rényi; only (anti)unitary
Scale-invariant	NMF, inverse problems	Affine/scale-invariance via optimal scaling
Isometric/unitary	Quantum info, tensors	Data-processing, projective or local invariance

The table above summarizes key divergence transformation classes, their application domains, and structural properties.

7. Extensions and Theoretical Significance

Divergence transformations are not limited to scalar or matrix-valued densities. The extension to composite or relative complexity measures, e.g., LMC–Rényi and Fisher-type complexities, reveals that monotonicity and scaling properties are preserved under divergence interpolations and their algebraic conjugates (Iagar et al., 12 Dec 2025). In all cases, invariances and symmetries induced by divergence transformations sharpen our understanding of statistical structure, operational capacity, and robustness in both classical and quantum systems.

The unifying insight is that the landscape of divergences becomes malleable and structurally rich when one considers their associated transformation groups, enabling both flexible modeling and principled exploitation of invariance, monotonicity, and group symmetries for theory and applications.