Parallel multi-CPU and GPU implementations of MF-LogDet matrix-free kernels

Develop parallel multi-CPU and GPU implementations of the matrix-free kernels used by the matrix-free log-determinant (MF-LogDet) framework, including sparse monomial-based routines for gradient evaluation, Hessian–vector products, and directional third-order contractions.

Background

MF-LogDet relies on matrix-free operator evaluations that decompose over monomials and are naturally parallelizable. The authors emphasize that these computations are well suited to modern multicore and accelerator architectures.

Despite this suitability, the paper explicitly states that realizing multi-CPU and GPU implementations of these kernels remains an open direction aimed at achieving further scalability gains.

References

Several directions remain open for future study, including the design of effective preconditioners for tangent linear systems, multi-CPU and GPU implementations of the matrix-free kernels, and extensions beyond the polynomial setting to broader structured function classes such as trigonometric polynomials, symmetric polynomials, and low-rank structured models.