Lower FLOP bound for full tridiagonalization

Establish a formal arithmetic lower bound proving that any algorithm performing full tridiagonalization of a dense n×n skew-symmetric matrix via the LTL^T factorization (computing X=LTL^T with tridiagonal T) requires at least n^3/3 floating-point operations, i.e., demonstrate that full tridiagonalization cannot be accomplished in fewer than n^3/3 FLOPs.

Background

The paper develops and implements high-performance algorithms for tridiagonalizing skew-symmetric matrices via the Gauss-transform-based LTL^T factorization, with leading-order computational cost n^3/3 FLOPs for most blocked and unblocked variants. By analogy with classic dense linear algebra factorizations (e.g., Cholesky and LU), the authors argue that achieving a full tridiagonalization with fewer than n^3/3 FLOPs is highly unlikely, but they note that a rigorous proof establishing such a lower bound is still missing.

Clarifying this lower bound would solidify theoretical expectations for the arithmetic complexity of skew-symmetric tridiagonalization and guide optimization efforts toward communication and memory traffic rather than attempting to reduce FLOP counts below n^3/3.

References

In analogy to the Cholesky and LU decompositions, it is highly unlikely that a full tridiagonalization can be accomplished in fewer than n^3/3 FLOPs (but this remains to be proven).

— Skew-Symmetric Matrix Decompositions on Shared-Memory Architectures (2411.09859 - Satyarth et al., 15 Nov 2024) in Section Implementation (first paragraph)

Lower FLOP bound for full tridiagonalization

Background

References

Related Problems