Papers
Topics
Authors
Recent
Search
2000 character limit reached

αβ-log-det Divergence Overview

Updated 2 March 2026
  • αβ-log-det divergence is a two-parameter family defining information divergences for SPD matrices, unifying and interpolating classical measures like Stein’s loss and AIRM.
  • It exhibits strong geometric traits such as affine invariance, nonnegativity, spectral separability, and smoothness, with extensions to infinite-dimensional positive-definite operators.
  • Its flexible parameter mapping allows precise tuning for diverse applications in machine learning, signal processing, and multiway covariance modeling.

The αβ-log-det divergence (also known as the Alpha-Beta Log-Determinant or ABLD divergence) is a two-parameter family of information divergences between symmetric positive definite (SPD) matrices and admits generalizations to positive-definite operators on infinite-dimensional Hilbert spaces. The αβ-log-det divergence parametrically unifies and interpolates between well-known matrix divergences, including Stein’s loss, Jensen–Bregman LogDet (JBLD) divergence, Bhattacharyya (log-det zero) divergence, and the affine-invariant Riemannian distance (AIRM). Its parametric flexibility, strong geometric and invariance properties, and generalizability to the infinite-dimensional setting make it a central object in information geometry, statistical manifold analysis, and applications involving SPD representations in machine learning and signal processing (Cichocki et al., 2014, Cherian et al., 2021, Quang, 2017, Quang, 2016).

1. General Definition and Domain

Let P,QS+nP, Q \in S^n_+ be n×nn\times n real symmetric positive-definite matrices. For real parameters α,β\alpha, \beta satisfying α0\alpha \neq 0, β0\beta \neq 0, and α+β0\alpha + \beta \neq 0, the αβ-log-det divergence is defined as

DAB(α,β)(PQ)=1αβlogdet(α(P1Q)β+β(P1Q)αα+β).D^{(\alpha, \beta)}_{AB}(P \,\|\, Q) = \frac{1}{\alpha \beta} \log \det \left( \frac{\alpha (P^{-1}Q)^{\beta} + \beta (P^{-1}Q)^{-\alpha}}{\alpha + \beta} \right).

Letting λ1,,λn\lambda_1,\ldots,\lambda_n be the eigenvalues of M=P1QM = P^{-1}Q, an equivalent spectral form is

DAB(α,β)(PQ)=1αβi=1nlog(αλiβ+βλiαα+β).D^{(\alpha, \beta)}_{AB}(P \,\|\, Q) = \frac{1}{\alpha \beta} \sum_{i=1}^n \log \left( \frac{\alpha\,\lambda_i^{\beta} + \beta\,\lambda_i^{-\alpha}}{\alpha + \beta} \right).

This function admits continuous extensions for boundary cases (α=0\alpha = 0, β=0\beta = 0, α+β=0\alpha+\beta=0) via L’Hôpital’s rule, yielding affine-invariant Riemannian metrics in the α,β0\alpha, \beta \to 0 limit, and other forms for degenerate parameter combinations (Cichocki et al., 2014).

The domain of validity depends on the signs of α\alpha and β\beta. For αβ>0\alpha\cdot\beta > 0, the divergence is always finite. For αβ<0\alpha\cdot\beta < 0, additional eigenvalue constraints are required: e.g., λi>β/α1/(α+β)\lambda_i > |\beta/\alpha|^{1/(\alpha+\beta)} for α>0>β\alpha>0>\beta.

2. Special Cases and Recovery of Classical Divergences

By tuning (α,β)(\alpha, \beta), the αβ-log-det divergence specializes to many classical divergences:

(α,β)(\alpha, \beta) Divergence (Name) Formula / Property
(1,0)(1,0) or (0,1)(0,1) Stein’s loss / Burg divergence tr(Q1P)logdet(Q1P)n\operatorname{tr}(Q^{-1}P) - \log\det(Q^{-1}P) - n
(α,1α)(\alpha,1-\alpha) α\alpha-log-det divergence (1/[α(1α)])logdet[αM1α+(1α)Mα](1/[\alpha(1-\alpha)]) \log\det[ \alpha M^{1-\alpha} + (1-\alpha) M^{\alpha} ]
(s,s)(s,s) Power log-det (1/s2)logdet[(Ms+Ms)/2](1/s^2) \log\det[ (M^s + M^{-s})/2 ]
(1/2,1/2)(1/2,1/2) S-divergence (JBLD) 4logdetP+Q22logdetP2logdetQ4 \log\det\frac{P+Q}{2} - 2\log\det P - 2\log\det Q
(0,0)(0,0) Affine-invariant Riemannian metric (AIRM) log(P1/2QP1/2)F2\| \log(P^{-1/2} Q P^{-1/2}) \|_F^2
(1,0)(1,0) or (0,1)(0,1) sym. Jeffreys-KL tr(PQ1)+tr(QP1)2n\operatorname{tr}(P Q^{-1}) + \operatorname{tr}(Q P^{-1}) - 2n

Thus, the αβ-family comprises Burg (Stein’s) divergence, JBLD, Bhattacharyya distance (JBLD\sqrt{\mathrm{JBLD}}), symmetrized KL, AIRM, Cauchy-Schwarz, and other divergences as instances or limits (Cichocki et al., 2014, Cherian et al., 2021, Quang, 2016).

3. Key Properties and Structure: (α,β)(\alpha, \beta) Parameter Map

The repertoire of divergences covered by the αβ-log-det family is well captured by examining the (α,β)(\alpha, \beta)-plane:

  • β=0\beta=0 or α=0\alpha=0: generalized Stein's loss forms
  • α=β\alpha=\beta: symmetric power-log-det divergences
  • α+β=1\alpha+\beta=1: α\alpha-log-det divergences (relating to α\alpha-connections in information geometry)
  • (12,12)(\frac12,\frac12): S-divergence (JBLD)
  • (0,0)(0,0): AIRM

Properties of DAB(α,β)(PQ)D^{(\alpha,\beta)}_{AB}(P\,\|\,Q) (Cichocki et al., 2014):

  • Nonnegativity: D0D \geq 0, with equality iff P=QP = Q
  • Definiteness: D=0P=QD=0 \Leftrightarrow P=Q
  • Smoothness: in (α,β)(\alpha,\beta) and (P,Q)(P,Q) except for removable singularities
  • Spectral Separability: function of eigenvalues λi\lambda_i, i.e., D(PQ)=iD(λi1)D(P\,\|\,Q) = \sum_i D(\lambda_i\,\|\,1)
  • Affine-invariance: D(LPLTLQLT)=D(PQ)D(LPL^T \,\|\, LQL^T ) = D(P\,\|\,Q) for invertible LL
  • Scale-invariance: D(cPcQ)=D(PQ)D(cP\,\|\,cQ) = D(P\,\|\,Q), c>0c>0
  • Inversion Duality: DAB(α,β)(PQ)=DAB(α,β)(P1Q1)D^{(-\alpha,-\beta)}_{AB}(P\,\|\,Q) = D^{(\alpha,\beta)}_{AB}(P^{-1}\,\|\,Q^{-1})
  • Dual Symmetry: DAB(α,β)(PQ)=DAB(β,α)(QP)D^{(\alpha,\beta)}_{AB}(P\,\|\,Q) = D^{(\beta,\alpha)}_{AB}(Q\,\|\,P)
  • On Diagonal ⟹ Metric: For α=β\alpha=\beta, d(α)(P,Q)=DAB(α,α)(P,Q)d^{(\alpha)}(P,Q) = \sqrt{D^{(\alpha,\alpha)}_{AB}(P,Q)} satisfies the triangle inequality

This suggests that by moving along key lines or points in parameter space, practitioners can target divergences appropriate for a given problem, modulating sensitivity to spectrum or volume (Cichocki et al., 2014).

4. Infinite-Dimensional and Operator Extensions

The αβ-log-det divergence extends naturally from finite-dimensional SPD matrices to infinite-dimensional positive-definite operators, notably unitized trace-class and Hilbert-Schmidt operators on separable Hilbert spaces (Quang, 2017, Quang, 2016). In this context, extensions of the determinant—the Fredholm determinant for trace-class and the Hilbert–Carleman determinant for Hilbert–Schmidt perturbations—enable well-defined divergence formulas.

Infinite-dimensional Alpha-Beta Log-Det divergences take the form

Dα,β(A,B)=1αβlogdetX(αA+βB)[detX(A)]α[detX(B)]βD_{\alpha,\beta}(A, B) = \frac{1}{\alpha \beta} \log \frac{\det_{X}(\alpha A+\beta B)}{\left[\det_{X}(A)\right]^\alpha\left[\det_{X}(B)\right]^\beta}

where detX\det_X denotes the appropriate extended determinant, and A,BA, B are positive-definite unitized operators. Limits α0\alpha\to0, β0\beta\to0 recover the infinite-dimensional AIRM, while α=β=1\alpha=\beta=1 yields the infinite-dimensional Stein divergence (Quang, 2017, Quang, 2016).

For Regularized Kernel covariance operators (CXC_X, CYC_Y) in a Reproducing Kernel Hilbert Space (RKHS), the αβ-log-det divergence reduces to a Gram-matrix formula: Dα,β(CX,CY)=1αβlogdet(α(1nKX+λI)+β(1mKY+λI))(det(1nKX+λI)) ⁣α(det(1mKY+λI)) ⁣βD_{\alpha,\beta}(C_X,C_Y) = \frac{1}{\alpha\beta}\log \frac{ \det \left(\alpha\, (\frac{1}{n}K_X + \lambda I) + \beta\, (\frac{1}{m}K_Y + \lambda I ) \right) }{ \left( \det (\frac{1}{n}K_X + \lambda I)\right )^{\!\alpha} \left( \det (\frac{1}{m}K_Y + \lambda I) \right )^{\!\beta} } providing a computational path for infinite-dimensional divergences in learning applications (Quang, 2016).

5. Connections to Gaussian and Information Geometric Divergences

For multivariate normal densities p=N(μ1,Σ1)p = N(\mu_1, \Sigma_1), q=N(μ2,Σ2)q = N(\mu_2, \Sigma_2), the continuous gamma divergence DAC(α,β)(pq)D^{(\alpha,\beta)}_{AC}(p \,\|\,q) is directly expressible via the αβ-log-det divergence: DAC(α,β)(pq)=12DAB(α,β)(Σ21Σ11)+(μ1μ2)TS1(μ1μ2)2(α+β)D^{(\alpha,\beta)}_{AC}(p\,\|\,q) = \frac12 D^{(\alpha,\beta)}_{AB}( \Sigma_2^{-1} \,\|\, \Sigma_1^{-1} ) + \frac{ (\mu_1 - \mu_2)^{T} S^{-1} (\mu_1 - \mu_2) }{ 2(\alpha + \beta) } with S=αα+βΣ1+βα+βΣ2S = \frac{\alpha}{\alpha+\beta} \Sigma_1 + \frac{\beta}{\alpha+\beta} \Sigma_2 (Cichocki et al., 2014).

Special cases:

  • (α=1,β0)(\alpha=1, \beta\to0): Kullback-Leibler divergence
  • (α=β=1/2)(\alpha=\beta=1/2): Bhattacharyya distance
  • (α+β=1)(\alpha+\beta=1): Rényi divergence of order α\alpha
  • (α=β=1)(\alpha=\beta=1): Cauchy-Schwarz divergence

This reveals that αβ-log-det divergences not only cover matrix-level divergences but also bridge to statistical divergences between distributions.

6. Symmetrizations and Metric Properties

The αβ-log-det divergence is asymmetric in general. Two canonical symmetrizations are employed (Cichocki et al., 2014):

  1. Type-1 (Jeffreys-style):

DAB,sym1(α,β)(P,Q)=12[DAB(α,β)(PQ)+DAB(α,β)(QP)]D^{(\alpha, \beta)}_{AB, sym1}(P, Q) = \frac12\left[ D^{(\alpha, \beta)}_{AB}(P\,\|\,Q) + D^{(\alpha, \beta)}_{AB}(Q\,\|\,P) \right]

  1. Type-2 (Jensen–Shannon style):

DAB,sym2(α,β)(P,Q)=12[DAB(α,β)(PP+Q2)+DAB(α,β)(QP+Q2)]D^{(\alpha, \beta)}_{AB, sym2}(P, Q) = \frac12\left[ D^{(\alpha, \beta)}_{AB}(P\,\|\,\frac{P+Q}{2}) + D^{(\alpha, \beta)}_{AB}(Q\,\|\,\frac{P+Q}{2}) \right]

Type-1 subsumes the Jeffreys-KL divergence (when α=0,β=1\alpha=0, \beta=1 or vice versa), and is symmetric when α=β\alpha=\beta. For α=β=0\alpha=\beta=0, the square root of the divergence yields the affine-invariant Riemannian metric, satisfying the triangle inequality, and thus endowing the space of SPD matrices with a geodesic structure (Cichocki et al., 2014).

7. Applications, Learning, and Multiway Extensions

Recent work exploits the αβ-log-det divergence as a learnable meta-divergence for applications requiring similarity assessment between SPD matrices (Cherian et al., 2021). In supervised and unsupervised tasks (e.g., discriminative dictionary learning, clustering), parameters (α,β)(\alpha,\beta)—even allowed to be vector-valued—are optimized jointly with SPD dictionaries/centroids via Riemannian optimization schemes, harnessing the flexibility of the divergence family.

Empirical evaluation on multiple vision benchmarks demonstrates the advantage of automatically selecting from the αβ-family, with per-dictionary-atom vector-valued parameters yielding further performance gains (Cherian et al., 2021).

A multiway (Kronecker-separable) extension exists for block-covariance structures. For ΣP,ΣQ\Sigma_P, \Sigma_Q with Kronecker decompositions, the divergence splits into a sum of per-mode αβ-log-det divergences and a scale term, extending Hilbert, AIRM, and Stein divergences to the multi-tensor setting (Cichocki et al., 2014):

DAB(α,β)(ΣPΣQ)=DAB(α,β)(σP2/σQ2ININ)+k=1K(N/nk)DAB(α,β)(ΣP,kΣQ,k)D^{(\alpha,\beta)}_{AB}(\Sigma_P\,\|\,\Sigma_Q) = D^{(\alpha,\beta)}_{AB}(\sigma^2_P/\sigma^2_Q I_N\,\|\,I_N) + \sum_{k=1}^K (N/n_k) D^{(\alpha,\beta)}_{AB}(\Sigma_{P,k}\,\|\,\Sigma_{Q,k})

This suggests a natural fit for multi-modal and tensor-valued covariance modeling, especially for multiway Gaussian models or tensor factor analysis.


The αβ-log-det divergence family provides a unified, parameterized, and geometrically well-motivated divergence for SPD matrices and operators, enabling fine control over spectrum sensitivity and metric properties, with theoretical guarantees and proven utility in information geometry, statistical learning, and high-dimensional covariance modeling (Cichocki et al., 2014, Cherian et al., 2021, Quang, 2017, Quang, 2016).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to αβ-log-det Divergence.