Dice Question Streamline Icon: https://streamlinehq.com

Identify sources of initial-call overhead when using PyO3 from Python

Determine the specific sources of the initial-call runtime overhead when invoking Rust code from Python via the PyO3 Rust crate and quantify the contribution of each source to the total overhead, including crossing the Python Foreign Function Interface boundary, loading the Python module, and performing data type conversions.

Information Square Streamline Icon: https://streamlinehq.com

Background

The authors measure runtime performance of tensor kernels implemented in Python, Rust, and Rust-from-Python using PyO3. They note that overhead costs are included in the runtimes of the initial call per Python script execution when using Rust from Python and describe a warm-up step to control for these costs. Although they estimate the overhead, they explicitly state that the specific sources of this overhead are uncertain and call for future research to determine each source’s contribution.

Clarifying and attributing these overhead sources is important for interpreting cross-language performance comparisons and for guiding optimization efforts in PyO3-based extension modules.

References

The specific sources of the overhead cost for the first call are uncertain---possibilities include crossing the FFI boundary, loading the Python module, data type coversions, etc.---however, future research is required to determine the contribution of each source.

Improving Runtime Performance of Tensor Computations using Rust From Python (2510.01495 - Harding et al., 1 Oct 2025) in Section 3, Methodology (timing instrumentation paragraph)