Explain Rust and Numba slowdown relative to Rust-from-Python for large dense matvecs

Determine the reasons why the native Rust implementation and the Numba JIT-compiled Python implementation of the dense matrix–vector product exhibit higher average runtimes than the PyO3-based Rust-from-Python implementation at larger matrix sizes in the experiments, and identify the underlying factors responsible for this discrepancy.

Background

The dense matrix–vector product was evaluated across multiple matrix size regimes. While Rust-from-Python consistently delivered substantial speedups over pure Python, the authors observed that, at larger matrix sizes, both native Rust and Numba implementations sometimes had increased average runtimes relative to Rust-from-Python.

They explicitly state that the reasons for these increased runtimes are unclear from their experiments, motivating further investigation into the causes of this performance behavior.

References

Note that the reasons for the increased average runtimes for Rust and Numba over using Rust from Python for the larger matrix sizes in all three experiments is unclear from our experiments.

— Improving Runtime Performance of Tensor Computations using Rust From Python (2510.01495 - Harding et al., 1 Oct 2025) in Section 4.2, Numerical Experiments: Dense Matrix-Vector Product

Explain Rust and Numba slowdown relative to Rust-from-Python for large dense matvecs

Background

References

Related Problems