Papers
Topics
Authors
Recent
Search
2000 character limit reached

One-Sided Jacobi Algorithm: Advances & Applications

Updated 1 February 2026
  • One-Sided Jacobi Algorithm is an iterative method that orthogonalizes matrix columns via 2×2 plane rotations, revealing singular values and spectral features.
  • The technique leverages mixed-precision preconditioning and hierarchical blocking to achieve rapid convergence and efficient, scalable performance on modern hardware.
  • Its variants—trigonometric, hyperbolic, and quaternionic rotations—offer robust, accurate solutions for diverse spectral decomposition and digital signal processing challenges.

The one-sided Jacobi algorithm is a canonical iterative procedure for orthogonalizing the columns of a matrix via successive plane rotations, thereby diagonalizing its Gramian and revealing singular values or spectral characteristics. This approach is distinguished by its systematic application of 2×2 rotations solely from one side (right-multiplication), admitting strong parallelization, superior relative accuracy, and extensibility to trigonometric, hyperbolic, and structure-preserving variants. Recent advances in mixed-precision preconditioning, hierarchical blocking, and highly-parallel GPU/CPU architectures have rendered the one-sided Jacobi method a premier choice for high-performance singular value and eigenvalue computations.

1. Mathematical Framework of the One-Sided Jacobi Algorithm

Let ARm×nA\in\mathbb{R}^{m\times n} (or Cm×n\mathbb{C}^{m\times n}), mnm\ge n, be full column rank with singular value decomposition

A=UΣVT,A = U\Sigma V^T,

where UTU=IU^TU=I, VTV=IV^TV=I, and Σ=diag(σ1,,σn)\Sigma = \operatorname{diag}(\sigma_1,\dots, \sigma_n). The one-sided Jacobi algorithm seeks an orthogonal/uniquely structured VV such that post-rotated columns of A(k+1)=A(k)JkA^{(k+1)}=A^{(k)} J_k, via n(n1)/2n(n-1)/2 pairs of 2×2 Jacobi rotations per sweep, become mutually orthogonal. For general SVD, each sweep annihilates selected off-diagonal entries of ATAA^TA, and after sufficient sweeps, scaling columns yields the singular values and left singular vectors (Zhang et al., 2022, Gao et al., 2022, Novaković, 2014).

For the symmetric eigenvalue problem, the method works analogously by starting from a Cholesky or Bunch–Parlett factor GG of a Hermitian matrix HH, then orthogonalizing columns of GG using either trigonometric or hyperbolic Jacobi rotations, depending on the inertia of HH (Singer et al., 2010).

2. Jacobi Rotations: Trigonometric, Hyperbolic, Quaternionic

At each step, for columns indexed by (i,j)(i,j), the (real or complex) 2×2 pivot Gram matrix

Hij=[A:,iTA:,iA:,iTA:,j;A:,jTA:,iA:,jTA:,j]H_{ij} = [A_{:,i}^T A_{:,i} \quad A_{:,i}^T A_{:,j}; \quad A_{:,j}^T A_{:,i} \quad A_{:,j}^T A_{:,j}]

is diagonalized by a rotation. For the classical (trigonometric) form: tan2θ=2hijhjjhii,c=cosθ,s=sign(hij)sinθ.\tan 2\theta = \frac{2|h_{ij}|}{h_{jj}-h_{ii}}, \qquad c=\cos\theta, \quad s=\operatorname{sign}(h_{ij})\sin\theta. Columns are updated via [gi,gj][gi,gj][cs sc][g_i, g_j] \gets [g_i, g_j] \cdot \begin{bmatrix}c & -s \ s & c\end{bmatrix} (Singer et al., 2010).

For indefinite problems (hyperbolic rotations): tanh2ψ=2aijaii+ajj,c=coshψ,s=sign(aij)sinhψ,\tanh 2\psi = \frac{2|a_{ij}|}{a_{ii}+a_{jj}}, \quad c=\cosh\psi, \quad s=\operatorname{sign}(a_{ij})\sinh\psi, with a JJ-unitary action preserving the (indefinite) norm (Singer et al., 2010, Novakovic et al., 2010).

In quaternionic settings, 2×2 Jacobi rotations employ orthogonal JRS-symplectic matrices acting on the real counterpart ΓA\Gamma_A to preserve quaternion structure, with quadratic convergence under gap conditions (Ma et al., 2018).

3. Mixed-Precision Preconditioned One-Sided Jacobi

Recent enhancements perform an initial SVD at reduced precision (υ\upsilon), orthogonalize the right singular vectors in high precision (ω\omega), and use these as a preconditioner. Specifically (Zhang et al., 2022, Gao et al., 2022):

  • Compute low-precision SVD: A=YSZTA=Y S Z^T (with error O(υ)\mathcal{O}(\upsilon)).
  • Orthogonalize ZZ in high precision via modified Gram–Schmidt, yielding QQ.
  • Form C=AQC=AQ, which is nearly orthogonal: off(CTC)Fζ1A2υ\|\mathrm{off}(C^TC)\|_F\le \zeta_1 \|A\|^2 \upsilon.
  • Perform one-sided Jacobi sweeps at precision ω\omega, achieving quadratic convergence:

off(C(Δ)TC(Δ))F34/9doff(CTC)F2.\|\mathrm{off}(C^{(\Delta) T} C^{(\Delta)})\|_F \le \frac{\sqrt{34/9}}{d}\|\mathrm{off}(C^T C)\|_F^2.

Typically, 2–3 sweeps suffice for double-precision accuracy. Mixed-precision approaches yield approximately a twofold speedup on CPUs/GPUs versus pure double precision, without sacrificing orthogonality or accuracy (Zhang et al., 2022, Gao et al., 2022).

4. Parallelization, Blocking, and Pivot Ordering

The one-sided Jacobi algorithm is highly amenable to parallel and hierarchical blocking. Modern implementations distribute block-columns across MPI processes or GPU nodes, diagonalizing independent pivot blocks in each “p-step” using distinct rotation sets. Hierarchical blocking exploits memory tiers (global RAM, shared memory, registers) for maximal local computation (Novaković, 2014). Parallel pivot strategies, including modulus, Brent–Luk, row-cyclic-closest, and column-cyclic-closest, optimize p-step packing for load balancing and minimal communication (Novaković, 2014).

Block Cholesky and diagonal-pivoted strategies further accelerate convergence in hyperbolic settings, especially in indefinite eigenvector problems (Singer et al., 2010, Novakovic et al., 2010). Asynchronous communication, algorithmic barriers, and peer-to-peer GPU transfers are engineered to minimize synchronization costs and maintain scalable throughput.

5. Convergence Properties and Numerical Stability

The one-sided Jacobi algorithm exhibits global linear convergence of the off-diagonal Frobenius norm, and, once the Gramian is sufficiently diagonalized and a spectral gap is evident, quadratic convergence dominates (Zhang et al., 2022, Ma et al., 2018). The method is highly robust to roundoff error due to its use of orthogonal transformations.

Backward stability analyses for both standard and mixed-precision variants confirm that the computed singular vectors and values satisfy: (X+ΔX)V=UΣ,ΔX(i,:)2ϵJX(i,:)2, ϵJ=O(nu),(X+\Delta X)V = U\Sigma, \quad \|\Delta X(i,:)\|_2 \le \epsilon_J\|X(i,:)\|_2, \ \epsilon_J = O(n u), with similar bounds for QR-based preconditioning steps (Gao et al., 2022). In hyperbolic and quaternionic cases, structure-preservation and signature constraints ensure high relative accuracy and robust orthogonality (UTUIF1014\|U^TU-I\|_F\sim10^{-14}) (Singer et al., 2010, Ma et al., 2018).

6. Computational Complexity and Performance

Each 2×2 Jacobi rotation requires O(m)\mathcal{O}(m) or O(n)\mathcal{O}(n) flops, and a complete sweep over all pairs entails O(mn2)\mathcal{O}(mn^2) or O(n3)\mathcal{O}(n^3) effort, amortized by concurrent computation and blocked updates (Zhang et al., 2022, Novaković, 2014). Hierarchical blocking enables locality, reducing data movement per sweep to O(mn)O(mn). Empirical timings on large matrices (n>3,000n>3,000) confirm 1.5–2.4× speedup versus standard LAPACK or MAGMA SVD routines in both CPU and multi-GPU settings (Novaković, 2014, Gao et al., 2022). Hyperbolic full-block Cholesky variants further reduce both sweep count and wall time versus non-pivoted approaches (Singer et al., 2010).

Accuracy tests demonstrate singular value deviations less than 5×10145 \times 10^{-14} compared to reference algorithms, with negligible loss from mixed-precision computation.

7. Extensions, Structure-Preserving, and Application Domains

The one-sided Jacobi framework admits significant generalization:

  • Structure-preserving Jacobi algorithms for quaternions orthogonalize columns via JRS-symplectic rotations acting on real counterparts, supporting compact SVD for quaternion matrices and applications in color image compression (Ma et al., 2018).
  • Hyperbolic Jacobi methods extend to indefinite Hermitian problems and HSVD computation, with GPU-based multi-block variants delivering up to 17× speedup versus sequential runs for large nn (Novakovic et al., 2010, Singer et al., 2010).
  • Mixed-precision Jacobi SVDs, leveraging QR or low-precision SVD preconditioning, are now routine in modern eigensolvers for dense matrices (Zhang et al., 2022, Gao et al., 2022).
  • Parallel block Jacobi algorithms are exploited in distributed-memory clusters, shared-memory machines, and multi-GPU topologies for scalable large-scale spectral decompositions (Novaković, 2014, Singer et al., 2010).

A plausible implication is that further developments in memory hierarchy and low-precision arithmetic will continue to enhance the efficiency of one-sided Jacobi algorithms across domains requiring high-precision spectral computations.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to One-Sided Jacobi Algorithm.