Papers
Topics
Authors
Recent
2000 character limit reached

Matrix-based Rényi Entropy

Updated 20 December 2025
  • Matrix-based Rényi entropy is a kernel-based functional that estimates entropy directly from the eigen-spectrum of normalized Gram matrices without explicit density estimation.
  • It employs randomized numerical linear algebra and low-rank approximations to overcome the cubic cost of traditional eigendecomposition, ensuring scalability for large datasets.
  • The framework extends to multivariate, conditional, and quantum settings, enabling applications in deep learning, feature selection, and quantification of quantumness.

Matrix-based Rényi entropy is a functional that enables direct estimation of information-theoretic quantities from data via the spectrum of kernel (Gram) matrices, bypassing explicit density estimation. This framework extends Rényi entropy—a fundamental generalization of the classical Shannon entropy based on order-α\alpha matrix powers—to structured data, random processes, and multivariate settings, making it broadly applicable in machine learning, information theory, and quantum information contexts. Developments in randomized numerical linear algebra and low-rank representations have yielded scalable and robust computation schemes for large-scale data.

1. Definition and Core Properties

Given nn samples {x1,,xn}\{x_1, \ldots, x_n\}, one constructs a symmetric positive semidefinite (SPD) kernel (Gram) matrix ARn×nA \in \mathbb{R}^{n \times n}. After normalization so that tr(A)=1\operatorname{tr}(A) = 1, the matrix-based Rényi entropy of order αR+\alpha \in \mathbb{R}^+, α1\alpha \neq 1, is defined by

Hα(A)=11αlog(tr(Aα))=11αlog(i=1nλiα),H_\alpha(A) = \frac{1}{1-\alpha} \, \log\left( \operatorname{tr}(A^\alpha) \right) = \frac{1}{1-\alpha} \, \log\left( \sum_{i=1}^n \lambda_i^\alpha \right),

where {λi}\{\lambda_i\} are the eigenvalues of AA (Dong et al., 2022).

For α1\alpha \to 1, Hα(A)H_\alpha(A) converges to the matrix-based analogue of Shannon's entropy. The choice of kernel and normalization guarantees that 0λi10 \leq \lambda_i \leq 1 and Hα(A)0H_\alpha(A) \geq 0. The value α<2\alpha < 2 emphasizes low eigenvalue (tail) structure; α>2\alpha > 2 emphasizes leading eigenmodes.

This definition subsumes classical, quantum, and nonparametric data-driven settings (Reisizadeh et al., 2016, Yu et al., 2018).

2. Computational Considerations and Randomized Approximations

The direct computation of Hα(A)H_\alpha(A) via eigendecomposition has O(n3)O(n^3) time and O(n2)O(n^2) memory complexity, which is prohibitive for large nn. To address scalability, stochastic trace estimation is used: tr(f(A))=Ev[vf(A)v],\operatorname{tr}(f(A)) = \mathbb{E}_v [v^\top f(A) v], where vv is a random probe (Gaussian or Rademacher). The empirical estimator

tr(f(A))^=1sj=1svjf(A)vj\widehat{\operatorname{tr}(f(A))} = \frac{1}{s} \sum_{j=1}^s v_j^\top f(A) v_j

converts the trace into a sum of matrix-vector products (Dong et al., 2022, Gong et al., 2021).

For integer α\alpha, implicit powers can be computed iteratively: Aαv=A((Av))A^\alpha v = A(\ldots (Av)\ldots); for non-integer α\alpha, polynomial approximations (Taylor, Chebyshev), or Lanczos quadrature are effective. The total complexity reduces to O(n2sm)O(n^2 s m) for dense matrices, where ss is the number of probe vectors and mm the polynomial degree, both sublinear in nn. Rigorous error bounds are established for all these methods, with theoretical guarantees matching minimax lower bounds up to logarithmic factors (Dong et al., 2022, Gong et al., 2021).

Block low-rank approximations further accelerate computation in the presence of structure, e.g., after clustering matrix rows/columns (Gong et al., 2021).

3. Multivariate and Joint Extensions

Matrix-based Rényi entropy has been extended to joint, conditional, and multivariate cases relevant for mutual information and interaction information estimation. For mm random variables X(1),...,X(m)X^{(1)}, ..., X^{(m)} with Gram matrices {K()}\{K^{(\ell)}\} and normalized densities ρ()=K()/tr(K())\rho^{(\ell)} = K^{(\ell)}/\operatorname{tr}(K^{(\ell)}), the joint entropy uses the Hadamard product: K(1...m)=K(1)K(2)K(m),ρ(1...m)=K(1...m)/tr(K(1...m)),K^{(1...m)} = K^{(1)} \circ K^{(2)} \circ \cdots \circ K^{(m)}, \qquad \rho^{(1...m)} = K^{(1...m)}/\operatorname{tr}(K^{(1...m)}),

Hα(X(1),,X(m))=11αlogtr[(ρ(1...m))α].H_\alpha(X^{(1)}, \ldots, X^{(m)}) = \frac{1}{1-\alpha} \log \operatorname{tr}[(\rho^{(1...m)})^\alpha].

Analogous forms yield matrix-based mutual information, total correlation, and a suite of interactive information quantities (Yu et al., 2018). The resulting functionals are symmetric, subadditive, and admit tight bounds connecting marginal and joint entropies.

4. Matrix Inequalities, Quantum Context, and Theoretical Bounds

In the quantum case, A=ρA = \rho is a density matrix. The Rényi relative entropy

Dα(ρσ)=1α1logtr[ρασ1α]D_\alpha(\rho\Vert\sigma)=\frac{1}{\alpha-1} \log \operatorname{tr} [\rho^\alpha \sigma^{1-\alpha}]

gives rise to entropic bounds on conditional and mutual information: Hα(AB)ρ=Dα(ρABIAρB),Iα(A;B)ρ=Dα(ρABρAρB)H_\alpha(A|B)_\rho = -D_\alpha(\rho_{AB}\Vert I_A \otimes \rho_B), \qquad I_\alpha(A; B)_\rho = D_\alpha(\rho_{AB}\Vert \rho_A \otimes \rho_B) with tightness and equality characterized in terms of spectral properties (e.g., flat spectra, proportional supports).

Key bounds include:

  • Lower and upper bounds on Hα(ρ)H_\alpha(\rho) depending only on the rank and nonzero spectrum.
  • Determinant-trace inequalities and logdet bounds for Dα(ρσ)D_\alpha(\rho\Vert\sigma) (Reisizadeh et al., 2016).

Such results allow the replacement of full eigendecomposition by easier-to-compute determinant or trace constraints, which are more accessible for quantum coding theorems and physical experiments.

5. Low-Rank Matrix-based Rényi Entropy and Robust Approximations

To remedy sensitivity to noise and further enhance scalability, low-rank variants have been introduced. The low-rank matrix-based Rényi entropy retains only the leading kk eigenvalues λ1,...,λk\lambda_1, ..., \lambda_k: Hαk(G)=11αlog2(i=1kλiα+(nk)λrα),H^k_\alpha(G) = \frac{1}{1-\alpha}\log_2\left(\sum_{i=1}^k \lambda_i^\alpha + (n-k)\lambda_r^\alpha\right), where λr=(1i=1kλi)/(nk)\lambda_r = (1-\sum_{i=1}^k \lambda_i)/(n-k). This truncation makes Hαk(G)H^k_\alpha(G) more sensitive to informative perturbations (modifying top eigenmodes) and less sensitive to noise (spread across the tail), providing demonstrably improved robustness (Dong et al., 2022). Lanczos and random projection methods afford O(n2s)\mathcal{O}(n^2 s) or O(ns2)\mathcal{O}(n s^2) computation for large nn.

Empirical results confirm speedups (up to 15×15\times over full-matrix methods) and negligible loss in accuracy for tasks such as information bottleneck optimization and feature selection (Dong et al., 2022).

6. Generalizations, Cross-Entropy, and Axiomatic Properties

Matrix-based Rényi entropy and its relatives, such as α\alpha-cross-entropies, are formulated in RKHS using Gram matrices, enabling unbiased, nonparametric, and minimax-optimal estimation even for high-dimensional distributions (Sledge et al., 2021). For normalized empirical Gram matrices KPK_P, KQK_Q (from samples of PP, QQ), one has: Cα(KPKQ)=1α1logtr(KPαKQ1α)C_\alpha(K_P \Vert K_Q) = \frac{1}{\alpha-1}\log \operatorname{tr}(K_P^{\alpha} K_Q^{1-\alpha}) with generalizations (mirrored, tripartite). These satisfy all Rényi divergence axioms: non-negativity, continuity, monotonicity, additivity, and data processing inequalities (for suitable α\alpha and Gram arguments).

For pure-state quantum ensembles, Gram-matrix-based α\alpha-zz-Rényi coherence measures quantify ensemble quantumness in a rigorous resource-theoretic fashion, unifying Petz-type, sandwiched, and Tsallis entropies under a common umbrella, and connecting with majorizer coherence and operational quantumness distinctions (Yuan et al., 2022).

7. Applications and Empirical Evidence

Matrix-based Rényi entropy has been applied to a wide range of problems:

  • Information bottleneck and deep learning: scalable training of bottleneck-regularized networks on large datasets (CIFAR-10), with up to 50×50\times speedup and no loss of prediction accuracy for low-rank or randomized estimators (Dong et al., 2022, Dong et al., 2022).
  • Feature selection: robust and fast evaluation of mutual information for selecting informative features in high-dimensional classification (hyperspectral imaging, UCI datasets), outperforming classical PDF-based or histogram-based criteria (Yu et al., 2018, Dong et al., 2022).
  • Quantification of quantumness: computation of Gram-matrix Rényi coherence for pure-state quantum ensembles, with closed-form expressions for canonical state families (Yuan et al., 2022).
  • Stationary random processes: exact formulas for Rényi entropy rates of vector-valued Gaussian processes via the spectrum of block Toeplitz matrices (Mulherkar, 2018).

8. Open Problems and Future Perspectives

Active lines of research include:

  • Adaptive polynomial interpolation and error control near α1\alpha \to 1 regimes.
  • Extensions to streaming and distributed variants for ultra-large-scale datasets.
  • Extensions of Gram-matrix Rényi coherence from pure to mixed-state ensembles in quantum information.
  • Further theoretical investigation of the tradeoff between spectral truncation, robustness, and approximation error in low-rank entropy computation.

Matrix-based Rényi entropy and its algorithmic ecosystem thus provide an effective, theoretically grounded, and computationally tractable framework for information-theoretic analysis across machine learning, data science, and quantum domains (Dong et al., 2022, Gong et al., 2021, Dong et al., 2022, Sledge et al., 2021, Yuan et al., 2022).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Matrix-based Rényi Entropy.