Effectiveness of Advanced Compression versus Low-Precision Casting in CB-GMRES

Determine whether compression techniques more sophisticated than casting to low precision, applied to the Krylov basis within the Compressed Basis GMRES (CB-GMRES) solver on GPUs, can enable large end-to-end runtime savings while preserving the accuracy of the final linear system solution.

Background

GMRES is a memory-bound iterative solver whose performance on GPUs is limited by main memory bandwidth. Compressed Basis GMRES (CB-GMRES) alleviates this by storing Krylov basis vectors in lower precision (e.g., single or half precision), reducing memory traffic and often improving runtime with tolerable convergence delay.

The paper explores whether more sophisticated, in-register block-based lossy compression—beyond simple low-precision casting—can further reduce data movement cost without degrading the final solution accuracy. Such compression must meet stringent GPU constraints (e.g., decompression at memory bandwidth speeds and random block access) to avoid overheads that would negate performance gains.

References

An open question is whether compression techniques that are more sophisticated than casting to low precision can enable large runtime savings while preserving the accuracy of the final results.

FRSZ2 for In-Register Block Compression Inside GMRES on GPUs (2409.15468 - Grützmacher et al., 23 Sep 2024) in Abstract