A fully parallel, high precision, N-body code running on hybrid computing platforms (1207.2367v2)

Published 10 Jul 2012 in astro-ph.IM, cs.DC, and physics.comp-ph

Abstract: We present a new implementation of the numerical integration of the classical, gravitational, N-body problem based on a high order Hermite's integration scheme with block time steps, with a direct evaluation of the particle-particle forces. The main innovation of this code (called HiGPUs) is its full parallelization, exploiting both OpenMP and MPI in the use of the multicore Central Processing Units as well as either Compute Unified Device Architecture (CUDA) or OpenCL for the hosted Graphic Processing Units. We tested both performance and accuracy of the code using up to 256 GPUs in the supercomputer IBM iDataPlex DX360M3 Linux Infiniband Cluster provided by the italian supercomputing consortium CINECA, for values of N up to 8 millions. We were able to follow the evolution of a system of 8 million bodies for few crossing times, task previously unreached by direct summation codes. The code is freely available to the scientific community.

Citations (62)

View on Semantic Scholar

Summary

The paper presents the HiGPUs code, which uses a high-order Hermite integrator with block time steps to achieve exceptional precision in N-body simulations.
It employs full parallelization by integrating OpenMP, MPI, and GPU acceleration (CUDA/OpenCL) to deliver up to 80% efficiency on 256 GPUs.
The work demonstrates scalability for simulations with millions of particles, paving the way for precise astrophysical modeling and future HPC advancements.

An Advanced Fully Parallel $N$ -Body Code for Hybrid Computing Platforms

The paper "A fully parallel, high precision, $N$ -body code running on hybrid computing platforms" by Capuzzo-Dolcetta et al. details the development and performance evaluation of a novel computational tool, specifically designed to solve the classical gravitational $N$ -body problem. This problem entails predicting the motion of $N$ bodies under mutual gravitational attraction, a foundational issue in astrophysics with applications scaling from planetary systems to galaxy clusters.

Methodology and Innovation

The primary contribution of the paper is the code named HiGPUs, which leverages a high-order Hermite integrator combined with block time steps for efficient integration of particle dynamics. The code employs a direct evaluation approach of particle-particle forces, ensuring high precision due to the avoidance of approximations that could introduce uncontrolled errors in the solutions.

A major factor in HiGPUs' performance is its full parallelization for computational efficiency, utilizing a hybrid architecture combining CPUs and GPUs. Specifically, the code integrates the parallel capabilities of OpenMP and MPI to harness the multicore CPUs alongside either CUDA or OpenCL for GPU acceleration. The use of GPUs as high-performance computing accelerators, capable of handling both single and double precision operations, forms a crucial aspect that distinguishes this work from previous efforts.

Performance and Scalability

The authors validated the code on the IBM iDataPlex DX360M3 Linux Infiniband Cluster, deploying up to 256 GPUs for systems of up to eight million bodies. HiGPUs successfully performed simulations otherwise infeasible with traditional direct summation methods, managing to execute the evolution of such complex systems for several crossing times with impressive precision.

The scalability and efficiency of the code are noteworthy. HiGPUs maintained a high level of computational efficiency, achieving up to approximately 80% when employing 256 GPUs. This efficiency is indicative of the code's effective use of parallelization to manage the $O(N^2)$ complexity inherent to direct summation methods. The authors achieved a performance of over 100 TFLOPS, equating to an average of 400 GFLOPS per GPU, underscoring the software's robust utilization of GPU resources.

Technical Analysis

The paper goes into comprehensive details about various code sections, using metrics such as execution speed and efficiency to analyze performance bottlenecks. It was observed that the majority of computational load is effectively balanced when the number of particles exceeds certain thresholds, making full use of GPU capabilities. The authors identified areas for future optimization, particularly in minimizing communication overheads via enhanced MPI usage and leveraging OpenMP for CPU tasks.

Future Implications

This research holds significant implications both practically and theoretically for astrophysical simulations and high performance computing. Practically, the ability to simulate large-scale $N$ -body systems with high accuracy and minimal computational cost paves the way for more precise astrophysical models and predictions. Theoretically, the utilization of hybrid computing platforms as demonstrated can influence future computational software design, encouraging the integration of GPUs for other complex scientific computations.

Given the constraints of memory limits per GPU on current architectures, the paper also suggests the potential exploration of alternative GPU architectures or configurations to further extend the number of particles handled. Moreover, HiGPUs could serve as a valuable tool for research into dynamical systems with dense stellar populations, informing understandings of phenomena such as globular cluster dynamics.

Overall, this research exemplifies how leveraging advances in parallel computing technologies can markedly enhance the scale and precision of critical astrophysical simulations, thus offering a solid framework for further developments in the computational sciences.

PDF Markdown

Related Papers

YouTube

Show All Videos