Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 64 tok/s

Gemini 2.5 Pro 50 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 35 tok/s Pro

GPT-4o 77 tok/s Pro

Kimi K2 174 tok/s Pro

GPT OSS 120B 457 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Cyclic-by-Row Jacobi Algorithm

Updated 6 September 2025

The cyclic-by-row Jacobi algorithm is a fixed-order iterative method that computes eigenvalues and eigenvectors by systematically applying orthogonal or J-orthogonal rotations.
It eliminates element selection overhead through a row-wise pivot strategy, enabling efficient parallel and block-based implementations.
The method extends to hyperbolic, mixed-precision, and block-cyclic frameworks, significantly enhancing scalability, numerical robustness, and performance in large-scale eigensolvers.

The cyclic-by-row Jacobi algorithm is a fixed-order iterative method for computing the eigenvalues and eigenvectors of symmetric (or Hermitian) matrices. By applying a systematic sequence of orthogonal or J-orthogonal transformations to drive off-diagonal elements to zero, it eliminates the element selection overhead of classical Jacobi algorithms while enabling efficient, highly parallel implementations. The method generalizes naturally to block and parallel settings, is compatible with mixed-precision and preconditioning strategies, and underpins many contemporary large-scale eigensolvers.

1. Algorithmic Structure and Pivot Strategy

Classically, the Jacobi method applies a succession of elementary 2×2 rotations, each designed to zero out a single off-diagonal matrix element. In the cyclic-by-row Jacobi algorithm, the sequence of pivot pairs (p, q) is determined a priori, typically sweeping row-wise: for p = 1 to n–1, all q > p are processed in turn. The procedure for each sweep is:

For each pair (p, q) in the prescribed cyclic order, check if |Aₚq| exceeds a given threshold (often tol·√(|Aₚₚ·Aqq|) for scaling invariance).
If so, compute a Jacobi (Givens) rotation:

$\tau = \frac{A_{qq} - A_{pp}}{2A_{pq}},\qquad t = \begin{cases} 1 / (\tau + \sqrt{1 + \tau^2}), & \tau \geq 0\ 1 / (\tau - \sqrt{1 + \tau^2}), & \tau < 0 \end{cases}$

$c = 1/\sqrt{1+t^2},\quad s = t\cdot c$

Apply the rotation to rows and columns p and q.
Accumulate the transformation in the eigenvector matrix, if desired.

Each sweep thus processes all n(n–1)/2 off-diagonal pairs in a cost O(n²), with no per-step search for a maximal element (Zhou, 30 Aug 2025). Extension to block Jacobi and block modulus (cyclic) strategies is achieved by grouping columns or rows into blocks and applying block rotations simultaneously, forming the basis for parallel execution (Singer et al., 2010, Singer et al., 2010).

2. Mathematical Foundations and Rotations

At its core, the method operates by successively diagonalizing 2×2 pivot submatrices embedded within the larger matrix. For standard (trigonometric) Jacobi rotations:

$Q_p = \begin{bmatrix} \cos\varphi & -\sin\varphi \ \sin\varphi & \cos\varphi \end{bmatrix}$

$Q_p^* A_p Q_p$ diagonalizes each pivot, with the angle $\varphi$ computed as described above.

In the generalized J-Jacobi or hyperbolic cases, the pivot is constructed as $A_p = [g_i, g_j] J [g_i, g_j]^*$ . When the signs in J differ, a hyperbolic rotation is used:

$Q_p = \begin{bmatrix} \cosh\varphi & \sinh\varphi \ \sinh\varphi & \cosh\varphi \end{bmatrix}$

This distinction is crucial when dealing with indefinite inner products or generalized eigenvalue problems (Singer et al., 2010). The computed rotations are applied (one-sided or two-sided), and transformations are accumulated in the eigenvector approximation matrix.

3. Parallelization, Blocking, and Modulus Scheduling

The cyclic-by-row ordering is particularly amenable to parallel and block algorithms. By assigning blocks (or block-pairs) to processes in a ring (one-dimensional torus) and applying pivots along anti-diagonals in a modulus or round-robin fashion, it is possible to ensure that, at every stage, all pivot blocks are disjoint and can be updated concurrently without data conflicts (Singer et al., 2010, Singer et al., 2010).

A typical parallelization employs:

Outer blocking: partitioning columns among processes for load balance.
Inner blocking: optimizing for cache/locality, subdividing each process’s blocks for local computation.
Cyclic communication: after each cycle, blocks are rotated among processes (odd/even sweeps alternate the block flow direction), governed by routines like “Next_Pair” and "Send_Receive" (Singer et al., 2010).
Block-level orthogonalizations are performed using highly optimized BLAS level-3 operations for efficiency (Singer et al., 2010).

This form of cyclic scheduling leads to excellent scalability on both distributed-memory (MPI-based) and shared-memory architectures, with the communication pattern naturally minimizing idle times and avoiding contention (Singer et al., 2010).

4. Convergence Properties and Pivot Strategy Robustness

Rigorous convergence analysis has established that the cyclic-by-row Jacobi algorithm, and more generally, all serial/generalized serial block Jacobi strategies, are globally convergent for real symmetric (and complex Hermitian) matrices (Hari et al., 2016, Hari et al., 2018). After each full sweep, the squared Frobenius norm of the off-diagonal entries (the "off-norm" $S(A)$ ) contracts by a fixed ratio $c<1$ :

$S^2(A^{(k+1)}) \leq c\, S^2(A^{(k)}),\quad c < 1$

This contraction is robust to a wide class of pivot orderings, extending to quasi-cyclic, weakly equivalent, and parallel block strategies (Begovic et al., 2017, Begovic et al., 2017). For n=4, all 720 cyclic pivot strategies have been shown to exhibit global convergence, with reductions such as

$S(A^{[t+3]}) \leq \gamma S(A^{[t]})$

for some $\gamma < 1$ independent of the pivot order (Begovic et al., 2017). However, the per-cycle reduction may be nonuniform; theoretical results show that in worst-case constructions, a single sweep may only slightly decrease $S(A)$ , but across two or three cycles, uniform convergence is recovered (Begovic et al., 2017, Begovic et al., 2017).

5. Numerical Stability, Accuracy, and Practical Performance

The algorithm's robustness extends to high accuracy in eigenvalue and eigenvector computations. Blocking and pivoting strategies improve convergence rates and the numerical properties of the eigenvector matrix, especially in hyperbolic variants, where measured orthogonality errors can reach $10^{-15}$ (hyperbolic) versus $10^{-12}$ (trigonometric) for trigonometric cases (Singer et al., 2010). Complete pivoting (trigonometric) or pivoted Cholesky (hyperbolic) enables eigenvalue “sorting” during the algorithm, improving convergence even on clustered spectra.

Testing on large matrices (orders $10^3$ – $1.6 \times 10^4$ ) demonstrated that cyclic-by-row blocking, with appropriate parallelization and pivoting, reduced the necessary in-process sweeps and led to speedups of ~30% over classical algorithms (Singer et al., 2010). Communication overhead is minimized through regular cyclic patterns, and the use of BLAS-3 operations leverages modern hardware.

Mixed-precision and preconditioned Jacobi variants further enhance performance and accuracy. By preconditioning the matrix with an approximate eigenvector matrix computed in low precision, then applying high-precision Jacobi updates, the relative forward error in eigenvalues can be controlled independently of the original matrix condition number, provided the preconditioner reduces off-diagonal dominance (Higham et al., 7 Jan 2025). In performance terms, mixed-precision preconditioning and polar decomposition–based orthogonalization can reduce total solution time by up to 70% compared to a traditional Jacobi solver (Zhou, 30 Aug 2025).

6. Extensions: Block, One-Sided, Hyperbolic, and Mixed-Precision Frameworks

The cyclic-by-row Jacobi algorithm generalizes across multiple axes:

Block and Hierarchical Variants: Replacing single elements with blocks enables local BLAS-3 computations and efficient use of cache, especially in three-level parallel J-Jacobi algorithms (Singer et al., 2010).
One-sided / Hyperbolic Jacobi: Designed for indefinite inner products or generalized eigenvalue problems, employing hyperbolic rotations and J-orthogonal transformations (Singer et al., 2010).
Mixed-Precision Eigensolvers: Using low-precision computations for preconditioning and initial diagonalization, followed by high-precision refinement, ensures high-accuracy eigenvalues with significantly reduced cost (Higham et al., 7 Jan 2025, Zhou, 30 Aug 2025).
Block-cyclic and Modulus Scheduling: Advanced process scheduling mechanisms for distributed-memory architectures, assigning block pairs via cyclic shifts in a ring topology and removing pivot dependencies between blocks to maximize parallelism (Singer et al., 2010, Singer et al., 2010).

Recent innovations further demonstrate improvements in numerical robustness (via improved stable rotation formulas (Borges, 2018)) and the extension to more general matrix classes (Hamiltonian and skew-Hamiltonian matrices using symplectic transformations (Baumgarten, 2020)).

7. Applications and Contemporary Use

The cyclic-by-row Jacobi algorithm and its extensions are prevalent in high-performance eigenvalue solvers for dense symmetric and Hermitian matrices, especially where high relative accuracy and scalability are critical. Its robustness in both sequential and massively parallel contexts, as well as compatibility with modern mixed-precision and preconditioned approaches, makes it suitable for scientific computing applications ranging from quantum mechanics to large-scale simulations (Singer et al., 2010, Higham et al., 7 Jan 2025, Zhou, 30 Aug 2025).

The method also serves as a core component in block algorithms for the singular value decomposition and for solving generalized (J-)eigenvalue problems, where guaranteed convergence and high-quality eigenvector computation are required. Its adaptability to current hardware architectures—distributed and shared memory, GPUs, and mixed-precision processors—ensures continued relevance in computational linear algebra (Singer et al., 2010, Gao et al., 2022, Islam et al., 2020).