Joint Diagonalization

Updated 12 March 2026

Joint Diagonalization is the process of finding a common basis in which multiple matrices become (approximately) diagonal, essential for algorithms in spectral analysis and signal processing.
It employs diverse methodologies such as Jacobi sweeps, Riemannian optimization, and randomized approaches to minimize off-diagonal energy.
Its applications span blind source separation, multivariate statistics, quantum information, and large-scale data analysis, addressing challenges in robustness and scalability.

Joint diagonalization is the process of finding a common basis in which two or more matrices become (approximately) diagonal simultaneously under a similarity or congruence transformation. Originating in the study of commuting operators, the concept is central to spectral analysis, multivariate statistics, blind source separation, quantum information, and many algorithmic disciplines. In contemporary practice, joint diagonalization is typically formulated as the optimization of an off-diagonal energy functional over a matrix group such as the orthogonal, unitary, or general linear group. This framework underpins a large family of modern algorithms, theoretical guarantees, and application-specific variants.

1. Mathematical Foundations: Exact and Approximate Joint Diagonalization

Given matrices $A_1, A_2, \dots, A_m$ in $\mathbb{C}^{n \times n}$ (or $\mathbb{R}^{n \times n}$ ), joint diagonalization asks for an invertible (or orthogonal/unitary) matrix $U$ such that each transformed matrix $U^{-1}A_kU$ is diagonal. For self-adjoint (Hermitian) matrices, exact joint diagonalizability is equivalent to pairwise commutativity: $A_kA_\ell = A_\ell A_k\quad\forall k, \ell \iff \exists U \text{ unitary s.t. } U^*A_kU \text{ is diagonal } \forall k.$ In most practical settings, the matrices are only approximately commuting, motivating the approximate joint diagonalization (AJD) problem: $\min_{U\in\mathcal{G}}\; \sum_{k=1}^m \mathrm{off}(U^{-1}A_kU),$ where $\mathcal{G}$ is typically the general linear group $GL(n)$ or its orthogonal/unitary subgroup, and $\mathrm{off}(M) = \|M - \mathrm{diag}(M)\|_F^2$ .

A key analytical result for Hermitian $\mathbb{C}^{n \times n}$ 0 establishes precise bounds relating the Frobenius norm of the commutator $\mathbb{C}^{n \times n}$ 1 and the minimal total off-diagonal energy: $\mathbb{C}^{n \times n}$ 2 where $\mathbb{C}^{n \times n}$ 3, and $\mathbb{C}^{n \times n}$ 4 as its argument vanishes. Almost-commuting matrices are thus almost jointly diagonalizable, both in upper and lower bound senses (Glashoff et al., 2013).

2. Algorithmic Methodologies for Joint Diagonalization

Several algorithmic regimes have emerged to address the joint diagonalization problem:

A. Jacobi-Sweep (JADE, CJDi, Hybrid Variants):

Iteratively annihilate the off-diagonal entries of the matrices using plane rotations (Givens for orthogonal/unitary JD, shear/hyperbolic for non-orthogonal cases), sweeping over all index pairs. For the symmetric or Hermitian setting, these methods optimize a cost functional such as

$\mathbb{C}^{n \times n}$ 5

over $\mathbb{C}^{n \times n}$ 6, updating it by sequences of rotations (Eynard et al., 2012, Mesloub et al., 2013, Nait-Meziane et al., 2018). Hybrid JD accommodates mixed congruence and transpose-JD scenarios.

B. Optimization-Based (Riemannian Gradient, Newton, Quasi-Newton):

AJD can be posed as smooth optimization on the Stiefel or general linear manifold, enabling the use of Riemannian gradient-descent, trust-region Newton, and quasi-Newton methods. Explicit forms for the Riemannian gradient and Hessian are available for the off-diagonal cost, allowing efficient higher-order methods with local quadratic convergence (Sato, 2014, Ablin et al., 2018, Troedsson et al., 11 Feb 2025, Troedsson et al., 2024). Quasi-Newton variants exploit the block-diagonal structure of the Hessian at optimality.

C. Randomized Approaches (RJD, RJD-BASE):

Randomized joint diagonalization leverages eigen-decomposition of random linear combinations of the input matrices. For symmetric AJD, RJD applies a standard eigenvalue solver to these combinations, selecting the resulting basis that minimizes the sum of off-diagonal energies (He et al., 2022). In multimodal spectral clustering, RJD-BASE samples convex combinations of Laplacians, extracting spectral embeddings directly relevant for clustering using a bottom- $\mathbb{C}^{n \times n}$ 7 aggregated energy criterion (He et al., 15 Sep 2025).

D. Sequential and Vector-Wise Algorithms:

Column-wise extraction is based on sequential maximization, at each step seeking an approximate common eigenvector via Riemannian sphere optimization, then constructing the joint diagonalizer by assembling these vectors and correcting for near-orthogonality (Li et al., 2022). This approach is both theoretically grounded (in terms of approximate orthogonality of eigenvectors) and empirically competitive.

E. Bayesian and Probabilistic Frameworks:

A fully Bayesian formulation models the common eigenstructure and variances, reflects uncertainty in the diagonalizer, and supports efficient Gibbs sampling on the Stiefel manifold (Zhong et al., 2012).

F. Block Diagonalization and Partial/Principal Variants:

General joint block-diagonalization (GJBD) seeks a basis where all input matrices decompose into commuting block-diagonal forms. Algorithms here may exploit the eigensystem of a matrix polynomial or employ principal joint block-diagonalization via nonlinear polar decomposition techniques optimized on the Stiefel manifold (Cai et al., 2017, Li et al., 10 Jan 2026).

3. Optimization Theory and Convergence Properties

Joint diagonalization objectives, typically the sum of squared off-diagonal energies, are nonconvex but well-posed in the absence of rank-deficiency, as the cost functional diverges near singularities unless the input matrices share a nontrivial common invariant subspace (Troedsson et al., 2024). Closed-form gradients and Hessians (as matrix operators or bilinear forms) are available for both the Frobenius-based and log-likelihood costs (Troedsson et al., 11 Feb 2025, Ablin et al., 2018, Seghouane et al., 2024).

Multiplicative updates in a moving basis significantly improve the regularity of the descent landscape and the robustness to conditioning, and allow the step-size to be estimated via local Hessian information with practical global convergence safeguards (Troedsson et al., 11 Feb 2025). Riemannian trust-region methods further guarantee convergence from arbitrary initializations (Sato, 2014).

Approximate joint diagonalization in the presence of almost-commuting inputs is now known to be controlled by the commutator norm, extending classical results of Lin and subsequent refinements: if $\mathbb{C}^{n \times n}$ 8, algorithms can always attain off-diagonal energy $\mathbb{C}^{n \times n}$ 9 (Glashoff et al., 2013, Li et al., 2022).

4. Methods for Large-Scale and Structured JD

For large matrices or large collections:

Partial and Principal Joint Diagonalization: Dimension reduction by extracting a partial jointly (almost) invariant subspace is achieved via subspace-iteration and projection, followed by classic JD in the reduced dimension (Seghouane et al., 2024). This is critical for large-scale applications such as fMRI analysis or multi-channel NMF.
Low-Rank and Block-Diagonalization: GJBD via matrix-polynomial eigenvectors reduces the problem to large but structured eigendecomposition, revealing block-invariant subspaces and allowing for efficient post-processing (Cai et al., 2017). NPDo/SCF iteration for principal JBD scales linearly in $\mathbb{R}^{n \times n}$ 0 and can handle thousands of dimensions (Li et al., 10 Jan 2026).
Randomized Schemes: Randomization enables JD at a much lower cost, with robust guarantees for approximate joint diagonalization up to an error matching the magnitude of commutators; deflation strategies enable recovery of well-diagonalized subspaces recursively (He et al., 2022).

5. Applications Across Disciplines

Joint diagonalization underpins fundamental procedures in:

Blind Source Separation and ICA: Algorithms such as JADE, SOBI, and their generalizations employ JD to separate mixed source signals using second- or higher-order statistics (Eynard et al., 2012, Mesloub et al., 2013, Nait-Meziane et al., 2018, Kleinsteuber et al., 2011).
Multimodal and Diffusion Geometry: Joint diagonalization of Laplacians allows principled fusion of graph-based features from multiple data modalities, extending spectral clustering and manifold learning to multi-view settings (Eynard et al., 2012, He et al., 15 Sep 2025).
Tensor and Multilinear Analysis: Canonical polyadic (CP) and related tensor decompositions depend on extracting a common diagonalizing basis for slices or unfoldings of the tensor (Troedsson et al., 2024).
Factorization for Source Separation: Multichannel NMF methods with joint diagonalization constraints, such as FastMNMF and ILRMA, use JD both to regularize and to accelerate the extraction of demixing matrices (Kamo et al., 2020).
Statistical Methods: Common principal component analysis (CPCA), common spatial patterns (CSP) in EEG, and estimation of multiple variance components rely on JD of covariance or related matrices (Kleinsteuber et al., 2011, Zhong et al., 2012, Vlaming et al., 2021).

JD also arises in quantum information (simultaneous block diagonalization for invariant subspaces), multidimensional signal processing (ESPRIT harmonic retrieval), and network/shape analysis (simultaneous spectral analysis of Laplacians).

6. Limitations, Extensions, and Open Directions

Dimension-dependence and Scalability: Most explicit JD error bounds are asymptotic and introduce unfavorable dependencies on the ambient dimension $\mathbb{R}^{n \times n}$ 1. Sharpening these bounds, or developing methods with dimension-free rates, remains open (Glashoff et al., 2013).
Extension to Multiple Matrices: While the commutator-based theory is sharp for two Hermitian matrices, extending it to $\mathbb{R}^{n \times n}$ 2 is nontrivial; the pairwise commutator norms control approximate diagonalizability only up to multiplicative constants depending on the ensemble (Glashoff et al., 2013).
Uniqueness and Identifiability: For non-orthogonal JD, identifiability (uniqueness up to scaling and permutation) depends on subtle algebraic criteria, particularly joint collinearity of diagonals; theory is available for a wide range of application-driven cases (Kleinsteuber et al., 2011).
Noise and Robustness: Bayesian and regularization approaches explicitly account for uncertainty and noise, with model selection tools (e.g., BIC) to estimate the intrinsic number of common eigenvectors (Zhong et al., 2012).
Non-orthogonal, hybrid, and structured JD: Expansions to NOJD, HJD, and PJBD provide additional flexibility at the expense of more algorithmic and theoretical complexity; modern SCF and SVD-based updates mitigate computational costs (Nait-Meziane et al., 2018, Mesloub et al., 2013, Li et al., 10 Jan 2026).

Recent attention focuses on randomized JD methods, integration with scalable NMF/tensor models, and further theoretical refinements on convergence and sharpness in high-dimension and multi-block cases.

References

(Glashoff et al., 2013) Glashoff & Bronstein, "Almost-commuting matrices are almost jointly diagonalizable"
(Seghouane et al., 2024) Seghouane & Saad, "Joint Approximate Partial Diagonalization of Large Matrices"
(Eynard et al., 2012) Bronstein et al., "Multimodal diffusion geometry by joint diagonalization of Laplacians"
(Li et al., 10 Jan 2026) Qiao et al., "An NPDo Approach for Principal Joint Block Diagonalization"
(Troedsson et al., 2024) Troedsson, "On joint eigen-decomposition of matrices"
(Cai et al., 2017) Cai et al., "Solving General Joint Block Diagonalization Problem via Linearly Independent Eigenvectors of a Matrix Polynomial"
(He et al., 15 Sep 2025) Willms, "RJD-BASE: Multi-Modal Spectral Clustering via Randomized Joint Diagonalization"
(Mesloub et al., 2013) Ablin et al., "A new algorithm for complex non orthogonal joint diagonalization based on Shear and Givens rotations"
(Zhang et al., 2021) Bertin et al., "Leveraging Joint-Diagonalization in Transform-Learning NMF"
(Sato, 2014) Sato, "Riemannian Newton-type methods for joint diagonalization on the Stiefel manifold"
(Li et al., 2022) Guo et al., "Vector-wise Joint Diagonalization of Almost Commuting Matrices"
(Liu et al., 2021) Fei et al., "Robust Blind Source Separation by Soft Decision-Directed Non-Unitary Joint Diagonalization"
(Nait-Meziane et al., 2018) Nait-Meziane et al., "Hybrid Joint Diagonalization Algorithms"
(Kleinsteuber et al., 2011) Kleinsteuber & Shen, "Uniqueness Analysis of Non-Unitary Matrix Joint Diagonalization"
(Ablin et al., 2018) Ablin et al., "Beyond Pham's algorithm for joint diagonalization"
(Kamo et al., 2020) Sekiguchi et al., "Regularized Fast Multichannel Nonnegative Matrix Factorization with ILRMA-based Prior Distribution of Joint-Diagonalization Process"
(Vlaming et al., 2021) de Vlaming & Visscher, "Joint Approximate Diagonalization under Orthogonality Constraints"
(He et al., 2022) Wang & Willms, "Randomized Joint Diagonalization of Symmetric Matrices"
(Troedsson et al., 11 Feb 2025) Troedsson, "Optimization Methods for Joint Eigendecomposition"
(Zhong et al., 2012) Zhong & Girolami, "A Bayesian Approach to Approximate Joint Diagonalization of Square Matrices"