Dice Question Streamline Icon: https://streamlinehq.com

Explain ThunderX2 Imbalance in Ondes3D

Determine the cause of the observed load imbalance and longer overall execution time of the Ondes3D seismic wave simulator on the ARM ThunderX2 99xx architecture, despite the CPML4 microkernel executing faster on ThunderX2 than on Intel Skylake, by identifying which specific kernels or domain regions (intermediates, stress, velocity; Physical Domain, Absorbing Boundary Conditions, Free Surface) are responsible for the imbalance and performance degradation.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper evaluates temporal load imbalance of the Ondes3D seismic simulator across eight multicore architectures, including Intel, AMD, and ARM ThunderX2. While AMD Zen 2 generally minimizes imbalance and achieves the best performance, ARM ThunderX2 exhibits the highest spatial imbalance and the longest execution time.

A microkernel analysis focusing on CPML4 shows that, counterintuitively, ThunderX2 executes this microkernel faster than Intel Skylake. This suggests the primary source of ThunderX2’s observed imbalance and performance issues likely lies elsewhere in the computation, motivating a focused investigation into which kernels or computational regions drive the imbalance on ThunderX2.

References

Our microkernels investigation shows that ThunderX2 is faster than Intel, leaving an open research issue for future studies on its imbalance.

Temporal Load Imbalance on Ondes3D Seismic Simulator for Different Multicore Architectures (2409.11392 - Solórzano et al., 17 Sep 2024) in Section VI (Discussion and Conclusion)