Matrix-based Rényi Entropy

Updated 20 December 2025

Matrix-based Rényi entropy is a kernel-based functional that estimates entropy directly from the eigen-spectrum of normalized Gram matrices without explicit density estimation.
It employs randomized numerical linear algebra and low-rank approximations to overcome the cubic cost of traditional eigendecomposition, ensuring scalability for large datasets.
The framework extends to multivariate, conditional, and quantum settings, enabling applications in deep learning, feature selection, and quantification of quantumness.

Matrix-based Rényi entropy is a functional that enables direct estimation of information-theoretic quantities from data via the spectrum of kernel (Gram) matrices, bypassing explicit density estimation. This framework extends Rényi entropy—a fundamental generalization of the classical Shannon entropy based on order- $\alpha$ matrix powers—to structured data, random processes, and multivariate settings, making it broadly applicable in machine learning, information theory, and quantum information contexts. Developments in randomized numerical linear algebra and low-rank representations have yielded scalable and robust computation schemes for large-scale data.

1. Definition and Core Properties

Given $n$ samples $\{x_1, \ldots, x_n\}$ , one constructs a symmetric positive semidefinite (SPD) kernel (Gram) matrix $A \in \mathbb{R}^{n \times n}$ . After normalization so that $\operatorname{tr}(A) = 1$ , the matrix-based Rényi entropy of order $\alpha \in \mathbb{R}^+$ , $\alpha \neq 1$ , is defined by

$H_\alpha(A) = \frac{1}{1-\alpha} \, \log\left( \operatorname{tr}(A^\alpha) \right) = \frac{1}{1-\alpha} \, \log\left( \sum_{i=1}^n \lambda_i^\alpha \right),$

where $\{\lambda_i\}$ are the eigenvalues of $A$ (Dong et al., 2022).

For $\alpha \to 1$ , $H_\alpha(A)$ converges to the matrix-based analogue of Shannon's entropy. The choice of kernel and normalization guarantees that $0 \leq \lambda_i \leq 1$ and $H_\alpha(A) \geq 0$ . The value $\alpha < 2$ emphasizes low eigenvalue (tail) structure; $\alpha > 2$ emphasizes leading eigenmodes.

This definition subsumes classical, quantum, and nonparametric data-driven settings (Reisizadeh et al., 2016, Yu et al., 2018).

2. Computational Considerations and Randomized Approximations

The direct computation of $H_\alpha(A)$ via eigendecomposition has $O(n^3)$ time and $O(n^2)$ memory complexity, which is prohibitive for large $n$ . To address scalability, stochastic trace estimation is used: $\operatorname{tr}(f(A)) = \mathbb{E}_v [v^\top f(A) v],$ where $v$ is a random probe (Gaussian or Rademacher). The empirical estimator

$\widehat{\operatorname{tr}(f(A))} = \frac{1}{s} \sum_{j=1}^s v_j^\top f(A) v_j$

converts the trace into a sum of matrix-vector products (Dong et al., 2022, Gong et al., 2021).

For integer $\alpha$ , implicit powers can be computed iteratively: $A^\alpha v = A(\ldots (Av)\ldots)$ ; for non-integer $\alpha$ , polynomial approximations (Taylor, Chebyshev), or Lanczos quadrature are effective. The total complexity reduces to $O(n^2 s m)$ for dense matrices, where $s$ is the number of probe vectors and $m$ the polynomial degree, both sublinear in $n$ . Rigorous error bounds are established for all these methods, with theoretical guarantees matching minimax lower bounds up to logarithmic factors (Dong et al., 2022, Gong et al., 2021).

Block low-rank approximations further accelerate computation in the presence of structure, e.g., after clustering matrix rows/columns (Gong et al., 2021).

3. Multivariate and Joint Extensions

Matrix-based Rényi entropy has been extended to joint, conditional, and multivariate cases relevant for mutual information and interaction information estimation. For $m$ random variables $X^{(1)}, ..., X^{(m)}$ with Gram matrices $\{K^{(\ell)}\}$ and normalized densities $\rho^{(\ell)} = K^{(\ell)}/\operatorname{tr}(K^{(\ell)})$ , the joint entropy uses the Hadamard product: $K^{(1...m)} = K^{(1)} \circ K^{(2)} \circ \cdots \circ K^{(m)}, \qquad \rho^{(1...m)} = K^{(1...m)}/\operatorname{tr}(K^{(1...m)}),$

$H_\alpha(X^{(1)}, \ldots, X^{(m)}) = \frac{1}{1-\alpha} \log \operatorname{tr}[(\rho^{(1...m)})^\alpha].$

Analogous forms yield matrix-based mutual information, total correlation, and a suite of interactive information quantities (Yu et al., 2018). The resulting functionals are symmetric, subadditive, and admit tight bounds connecting marginal and joint entropies.

4. Matrix Inequalities, Quantum Context, and Theoretical Bounds

In the quantum case, $A = \rho$ is a density matrix. The Rényi relative entropy

$D_\alpha(\rho\Vert\sigma)=\frac{1}{\alpha-1} \log \operatorname{tr} [\rho^\alpha \sigma^{1-\alpha}]$

gives rise to entropic bounds on conditional and mutual information: $H_\alpha(A|B)_\rho = -D_\alpha(\rho_{AB}\Vert I_A \otimes \rho_B), \qquad I_\alpha(A; B)_\rho = D_\alpha(\rho_{AB}\Vert \rho_A \otimes \rho_B)$ with tightness and equality characterized in terms of spectral properties (e.g., flat spectra, proportional supports).

Key bounds include:

Lower and upper bounds on $H_\alpha(\rho)$ depending only on the rank and nonzero spectrum.
Determinant-trace inequalities and logdet bounds for $D_\alpha(\rho\Vert\sigma)$ (Reisizadeh et al., 2016).

Such results allow the replacement of full eigendecomposition by easier-to-compute determinant or trace constraints, which are more accessible for quantum coding theorems and physical experiments.

5. Low-Rank Matrix-based Rényi Entropy and Robust Approximations

To remedy sensitivity to noise and further enhance scalability, low-rank variants have been introduced. The low-rank matrix-based Rényi entropy retains only the leading $k$ eigenvalues $\lambda_1, ..., \lambda_k$ : $H^k_\alpha(G) = \frac{1}{1-\alpha}\log_2\left(\sum_{i=1}^k \lambda_i^\alpha + (n-k)\lambda_r^\alpha\right),$ where $\lambda_r = (1-\sum_{i=1}^k \lambda_i)/(n-k)$ . This truncation makes $H^k_\alpha(G)$ more sensitive to informative perturbations (modifying top eigenmodes) and less sensitive to noise (spread across the tail), providing demonstrably improved robustness (Dong et al., 2022). Lanczos and random projection methods afford $\mathcal{O}(n^2 s)$ or $\mathcal{O}(n s^2)$ computation for large $n$ .

Empirical results confirm speedups (up to $15\times$ over full-matrix methods) and negligible loss in accuracy for tasks such as information bottleneck optimization and feature selection (Dong et al., 2022).

6. Generalizations, Cross-Entropy, and Axiomatic Properties

Matrix-based Rényi entropy and its relatives, such as $\alpha$ -cross-entropies, are formulated in RKHS using Gram matrices, enabling unbiased, nonparametric, and minimax-optimal estimation even for high-dimensional distributions (Sledge et al., 2021). For normalized empirical Gram matrices $K_P$ , $K_Q$ (from samples of $P$ , $Q$ ), one has: $C_\alpha(K_P \Vert K_Q) = \frac{1}{\alpha-1}\log \operatorname{tr}(K_P^{\alpha} K_Q^{1-\alpha})$ with generalizations (mirrored, tripartite). These satisfy all Rényi divergence axioms: non-negativity, continuity, monotonicity, additivity, and data processing inequalities (for suitable $\alpha$ and Gram arguments).

For pure-state quantum ensembles, Gram-matrix-based $\alpha$ - $z$ -Rényi coherence measures quantify ensemble quantumness in a rigorous resource-theoretic fashion, unifying Petz-type, sandwiched, and Tsallis entropies under a common umbrella, and connecting with majorizer coherence and operational quantumness distinctions (Yuan et al., 2022).

7. Applications and Empirical Evidence

Matrix-based Rényi entropy has been applied to a wide range of problems:

Information bottleneck and deep learning: scalable training of bottleneck-regularized networks on large datasets (CIFAR-10), with up to $50\times$ speedup and no loss of prediction accuracy for low-rank or randomized estimators (Dong et al., 2022, Dong et al., 2022).
Feature selection: robust and fast evaluation of mutual information for selecting informative features in high-dimensional classification (hyperspectral imaging, UCI datasets), outperforming classical PDF-based or histogram-based criteria (Yu et al., 2018, Dong et al., 2022).
Quantification of quantumness: computation of Gram-matrix Rényi coherence for pure-state quantum ensembles, with closed-form expressions for canonical state families (Yuan et al., 2022).
Stationary random processes: exact formulas for Rényi entropy rates of vector-valued Gaussian processes via the spectrum of block Toeplitz matrices (Mulherkar, 2018).

8. Open Problems and Future Perspectives

Active lines of research include:

Adaptive polynomial interpolation and error control near $\alpha \to 1$ regimes.
Extensions to streaming and distributed variants for ultra-large-scale datasets.
Extensions of Gram-matrix Rényi coherence from pure to mixed-state ensembles in quantum information.
Further theoretical investigation of the tradeoff between spectral truncation, robustness, and approximation error in low-rank entropy computation.

Matrix-based Rényi entropy and its algorithmic ecosystem thus provide an effective, theoretically grounded, and computationally tractable framework for information-theoretic analysis across machine learning, data science, and quantum domains (Dong et al., 2022, Gong et al., 2021, Dong et al., 2022, Sledge et al., 2021, Yuan et al., 2022).