Explain Rust and Numba slowdown relative to Rust-from-Python for large dense matvecs
Determine the reasons why the native Rust implementation and the Numba JIT-compiled Python implementation of the dense matrix–vector product exhibit higher average runtimes than the PyO3-based Rust-from-Python implementation at larger matrix sizes in the experiments, and identify the underlying factors responsible for this discrepancy.
References
Note that the reasons for the increased average runtimes for Rust and Numba over using Rust from Python for the larger matrix sizes in all three experiments is unclear from our experiments.
— Improving Runtime Performance of Tensor Computations using Rust From Python
(2510.01495 - Harding et al., 1 Oct 2025) in Section 4.2, Numerical Experiments: Dense Matrix-Vector Product