Overview of "Zhores" Petaflops Supercomputer at Skolkovo Institute
The paper describes the architecture, installation, and initial benchmarks of the "Zhores" Petaflops supercomputer at the Skolkovo Institute of Science and Technology's Center for Computational and Data-Intensive Science and Engineering (CDISE). Designed to facilitate advanced research in data-driven modeling, machine learning, and artificial intelligence, Zhores integrates cutting-edge computing technology. Its operational scope encompasses various scientific domains including digital pharma, predictive analytics, and image processing.
Hardware and Network Architecture
Zhores is built with DELL PowerEdge servers equipped with Intel Xeon CPUs and NVIDIA V100 GPUs. The cluster uses Mellanox EDR InfiniBand for high-speed nodes interconnection. CPU node configurations are optimized for throughput computing, with performance scaling confirmed through benchmarking of floating-point operations and memory bandwidth. The GPU nodes leverage NVLink connections for intra-node communication, maximizing parallel computation. The paper identifies potential oversubscription in network resources, suggesting areas for future optimization.
Software and System Management
CentOS Linux serves as the operating system, with Luna management software orchestrating provisioning and cluster operations. Utilizing BitTorrent for OS image deployments ensures efficient scalability across nodes. SLURM is used for workload management, organizing tasks to exploit the multi-architecture of Zhores. Docker containers provide additional virtualization, facilitating security in multi-user scenarios, as demonstrated in the Neurohackathon event.
Performance benchmarks, including Linpack tests, validate the supercomputer's capabilities, achieving 120.2 TFlop/s on CPU-only tasks and 496 TFlop/s on GPU nodes. Applications in kinetic modeling and molecular dynamics demonstrate strong scaling potential, with notable efficiency improvements observed in CPU-bound advection-coagulation simulations and GPU-accelerated Gromacs trials. These benchmarks affirm Zhores' suitability for both standard High-Performance Computing (HPC) workloads and complex domain-specific tasks.
Implications and Prospects
The deployment of Zhores represents a significant enhancement in computational capabilities available to Russian scientific establishments, placing Skolkovo's infrastructure among the nation's top supercomputers. The current system architecture reflects modern trends in HPC convergence with Big Data analytics and AI, providing a platform pivotal for multidisciplinary research. Looking forward, anticipates technological advancements might focus on network optimization and heterogeneous computing expansion, which could further amplify the applicability of Zhores in emerging AI-driven scientific investigations.
In summary, the Zhores supercomputer stands as a vital instrument within Skolkovo Institute, promising to foster progress across diverse scientific realms, underpinned by its extensive computational prowess and flexible operational design.