αβ-log-det Divergence Overview
- αβ-log-det divergence is a two-parameter family defining information divergences for SPD matrices, unifying and interpolating classical measures like Stein’s loss and AIRM.
- It exhibits strong geometric traits such as affine invariance, nonnegativity, spectral separability, and smoothness, with extensions to infinite-dimensional positive-definite operators.
- Its flexible parameter mapping allows precise tuning for diverse applications in machine learning, signal processing, and multiway covariance modeling.
The αβ-log-det divergence (also known as the Alpha-Beta Log-Determinant or ABLD divergence) is a two-parameter family of information divergences between symmetric positive definite (SPD) matrices and admits generalizations to positive-definite operators on infinite-dimensional Hilbert spaces. The αβ-log-det divergence parametrically unifies and interpolates between well-known matrix divergences, including Stein’s loss, Jensen–Bregman LogDet (JBLD) divergence, Bhattacharyya (log-det zero) divergence, and the affine-invariant Riemannian distance (AIRM). Its parametric flexibility, strong geometric and invariance properties, and generalizability to the infinite-dimensional setting make it a central object in information geometry, statistical manifold analysis, and applications involving SPD representations in machine learning and signal processing (Cichocki et al., 2014, Cherian et al., 2021, Quang, 2017, Quang, 2016).
1. General Definition and Domain
Let be real symmetric positive-definite matrices. For real parameters satisfying , , and , the αβ-log-det divergence is defined as
Letting be the eigenvalues of , an equivalent spectral form is
This function admits continuous extensions for boundary cases (, , ) via L’Hôpital’s rule, yielding affine-invariant Riemannian metrics in the limit, and other forms for degenerate parameter combinations (Cichocki et al., 2014).
The domain of validity depends on the signs of and . For , the divergence is always finite. For , additional eigenvalue constraints are required: e.g., for .
2. Special Cases and Recovery of Classical Divergences
By tuning , the αβ-log-det divergence specializes to many classical divergences:
| Divergence (Name) | Formula / Property | |
|---|---|---|
| or | Stein’s loss / Burg divergence | |
| -log-det divergence | ||
| Power log-det | ||
| S-divergence (JBLD) | ||
| Affine-invariant Riemannian metric (AIRM) | ||
| or sym. | Jeffreys-KL |
Thus, the αβ-family comprises Burg (Stein’s) divergence, JBLD, Bhattacharyya distance (), symmetrized KL, AIRM, Cauchy-Schwarz, and other divergences as instances or limits (Cichocki et al., 2014, Cherian et al., 2021, Quang, 2016).
3. Key Properties and Structure: Parameter Map
The repertoire of divergences covered by the αβ-log-det family is well captured by examining the -plane:
- or : generalized Stein's loss forms
- : symmetric power-log-det divergences
- : -log-det divergences (relating to -connections in information geometry)
- : S-divergence (JBLD)
- : AIRM
Properties of (Cichocki et al., 2014):
- Nonnegativity: , with equality iff
- Definiteness:
- Smoothness: in and except for removable singularities
- Spectral Separability: function of eigenvalues , i.e.,
- Affine-invariance: for invertible
- Scale-invariance: ,
- Inversion Duality:
- Dual Symmetry:
- On Diagonal ⟹ Metric: For , satisfies the triangle inequality
This suggests that by moving along key lines or points in parameter space, practitioners can target divergences appropriate for a given problem, modulating sensitivity to spectrum or volume (Cichocki et al., 2014).
4. Infinite-Dimensional and Operator Extensions
The αβ-log-det divergence extends naturally from finite-dimensional SPD matrices to infinite-dimensional positive-definite operators, notably unitized trace-class and Hilbert-Schmidt operators on separable Hilbert spaces (Quang, 2017, Quang, 2016). In this context, extensions of the determinant—the Fredholm determinant for trace-class and the Hilbert–Carleman determinant for Hilbert–Schmidt perturbations—enable well-defined divergence formulas.
Infinite-dimensional Alpha-Beta Log-Det divergences take the form
where denotes the appropriate extended determinant, and are positive-definite unitized operators. Limits , recover the infinite-dimensional AIRM, while yields the infinite-dimensional Stein divergence (Quang, 2017, Quang, 2016).
For Regularized Kernel covariance operators (, ) in a Reproducing Kernel Hilbert Space (RKHS), the αβ-log-det divergence reduces to a Gram-matrix formula: providing a computational path for infinite-dimensional divergences in learning applications (Quang, 2016).
5. Connections to Gaussian and Information Geometric Divergences
For multivariate normal densities , , the continuous gamma divergence is directly expressible via the αβ-log-det divergence: with (Cichocki et al., 2014).
Special cases:
- : Kullback-Leibler divergence
- : Bhattacharyya distance
- : Rényi divergence of order
- : Cauchy-Schwarz divergence
This reveals that αβ-log-det divergences not only cover matrix-level divergences but also bridge to statistical divergences between distributions.
6. Symmetrizations and Metric Properties
The αβ-log-det divergence is asymmetric in general. Two canonical symmetrizations are employed (Cichocki et al., 2014):
- Type-1 (Jeffreys-style):
- Type-2 (Jensen–Shannon style):
Type-1 subsumes the Jeffreys-KL divergence (when or vice versa), and is symmetric when . For , the square root of the divergence yields the affine-invariant Riemannian metric, satisfying the triangle inequality, and thus endowing the space of SPD matrices with a geodesic structure (Cichocki et al., 2014).
7. Applications, Learning, and Multiway Extensions
Recent work exploits the αβ-log-det divergence as a learnable meta-divergence for applications requiring similarity assessment between SPD matrices (Cherian et al., 2021). In supervised and unsupervised tasks (e.g., discriminative dictionary learning, clustering), parameters —even allowed to be vector-valued—are optimized jointly with SPD dictionaries/centroids via Riemannian optimization schemes, harnessing the flexibility of the divergence family.
Empirical evaluation on multiple vision benchmarks demonstrates the advantage of automatically selecting from the αβ-family, with per-dictionary-atom vector-valued parameters yielding further performance gains (Cherian et al., 2021).
A multiway (Kronecker-separable) extension exists for block-covariance structures. For with Kronecker decompositions, the divergence splits into a sum of per-mode αβ-log-det divergences and a scale term, extending Hilbert, AIRM, and Stein divergences to the multi-tensor setting (Cichocki et al., 2014):
This suggests a natural fit for multi-modal and tensor-valued covariance modeling, especially for multiway Gaussian models or tensor factor analysis.
The αβ-log-det divergence family provides a unified, parameterized, and geometrically well-motivated divergence for SPD matrices and operators, enabling fine control over spectrum sensitivity and metric properties, with theoretical guarantees and proven utility in information geometry, statistical learning, and high-dimensional covariance modeling (Cichocki et al., 2014, Cherian et al., 2021, Quang, 2017, Quang, 2016).