TANQ-Sim: Tensorcore Accelerated Noisy Quantum System Simulation via QIR on Perlmutter HPC (2404.13184v2)
Abstract: Although there have been remarkable advances in quantum computing (QC), it remains crucial to simulate quantum programs using classical large-scale parallel computing systems to validate quantum algorithms, comprehend the impact of noise, and develop resilient quantum applications. This is particularly important for bridging the gap between near-term noisy-intermediate-scale-quantum (NISQ) computing and future fault-tolerant quantum computing (FTQC). Nevertheless, current simulation methods either lack the capability to simulate noise, or simulate with excessive computational costs, or do not scale out effectively. In this paper, we propose TANQ-Sim, a full-scale density matrix based simulator designed to simulate practical deep circuits with both coherent and non-coherent noise. To address the significant computational cost associated with such simulations, we propose a new density-matrix simulation approach that enables TANQ-Sim to leverage the latest double-precision tensorcores (DPTCs) in NVIDIA Ampere and Hopper GPUs. To the best of our knowledge, this is the first application of double-precision tensorcores for non-AI/ML workloads. To optimize performance, we also propose specific gate fusion techniques for density matrix simulation. For scaling, we rely on the advanced GPU-side communication library NVSHMEM and propose effective optimization methods for enhancing communication efficiency. Evaluations on the NERSC Perlmutter supercomputer demonstrate the functionality, performance, and scalability of the simulator. We also present three case studies to showcase the practical usage of TANQ-Sim, including teleportation, entanglement distillation, and Ising simulation. TANQ-Sim will be released on GitHub.
- Quantum computation and quantum information, 2002.
- Quantum simulation. Reviews of Modern Physics, 86(1):153, 2014.
- Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature, 549(7671):242–246, 2017.
- Peter W Shor. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM review, 41(2):303–332, 1999.
- Quantum cryptography. Reviews of modern physics, 74(1):145, 2002.
- An introduction to quantum machine learning. Contemporary Physics, 56(2):172–185, 2015.
- Quantum machine learning. Nature, 549(7671):195–202, 2017.
- Quantum algorithm for linear systems of equations. Physical review letters, 103(15):150502, 2009.
- Preconditioned quantum linear system algorithm. Physical review letters, 110(25):250504, 2013.
- H Jeff Kimble. The quantum internet. Nature, 453(7198):1023–1030, 2008.
- From quantum multiplexing to high-performance quantum networking. Nature Photonics, 4(11):792–796, 2010.
- John Preskill. Quantum computing in the nisq era and beyond. Quantum, 2:79, 2018.
- Quantum error correction. Cambridge university press, 2013.
- Elucidating reaction mechanisms on quantum computers. Proceedings of the national academy of sciences, 114(29):7555–7560, 2017.
- Sv-sim: scalable pgas-based state vector simulation of quantum circuits. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–14, 2021.
- Empowering gnns with fine-grained communication-computation pipelining on multi-gpu platforms. arXiv preprint arXiv:2209.06800, 2022.
- Nvidia tensor core programmability, performance & precision. In 2018 IEEE international parallel and distributed processing symposium workshops (IPDPSW), pages 522–531. IEEE, 2018.
- Dissecting the nvidia volta gpu architecture via microbenchmarking. arXiv preprint arXiv:1804.06826, 2018.
- Dissecting tensor cores via microbenchmarks: Latency, throughput and numeric behaviors. IEEE Transactions on Parallel and Distributed Systems, 34(1):246–261, 2022.
- NVIDIA. Nvidia A100 Tensor Core GPU Architecture Whitepaper, 2020.
- NVIDIA. Nvidia H100 Tensor Core GPU Architecture Whitepaper, 2022.
- Efficient inter-node mpi communication using gpudirect rdma for infiniband clusters with nvidia gpus. In 2013 42nd International Conference on Parallel Processing, pages 80–89. IEEE, 2013.
- Sylvain Jeaugey. Nccl 2.0. In GPU Technology Conference (GTC), volume 2, 2017.
- An initial assessment of nvshmem for high performance computing. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 1–10. IEEE, 2020.
- {{\{{MGG}}\}}: Accelerating graph neural networks with {{\{{Fine-Grained}}\}}{{\{{Intra-Kernel}}\}}{{\{{Communication-Computation}}\}} pipelining on {{\{{Multi-GPU}}\}} platforms. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 779–795, 2023.
- Qcor: A language extension specification for the heterogeneous quantum-classical model of computation. ACM Journal on Emerging Technologies in Computing Systems (JETC), 16(2):1–17, 2020.
- Qasmbench: A low-level qasm benchmark suite for nisq evaluation and simulation. arXiv preprint arXiv:2005.13018, 2020.
- Density matrix quantum circuit simulation via the bsp machine on modern gpu clusters. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–15. IEEE, 2020.
- Architectures for multinode superconducting quantum computers. arXiv preprint arXiv:2212.06167, 2022.
- Autocomm: A framework for enabling efficient communication in distributed quantum programs. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1027–1041. IEEE, 2022.
- Qucomm: Optimizing collective communication for distributed quantum computing. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, pages 479–493, 2023.
- Quantum privacy amplification and the security of quantum cryptography over noisy channels. Phys. Rev. Lett., 77:2818–2821, Sep 1996.
- Compressed quantum computation using a remote five-qubit quantum computer. Phys. Rev. A, 95:052339, May 2017.
- Two soluble models of an antiferromagnetic chain. Annals of Physics, 16(3):407–466, 1961.
- Quantum circuits for strongly correlated quantum systems. Phys. Rev. A, 79:032316, Mar 2009.
- Alba Cervera-Lierta. Exact Ising model simulation on a quantum computer. Quantum, 2:114, December 2018.
- Quest and high performance simulation of quantum computers. Scientific reports, 9(1):1–11, 2019.
- Low-rank density-matrix evolution for noisy quantum circuits. npj Quantum Information, 7(1):1–12, 2021.
- Simulating quantum computation by contracting tensor networks. SIAM Journal on Computing, 38(3):963–981, 2008.
- Tensor network quantum virtual machine for simulating quantum circuits at exascale. arXiv preprint arXiv:2104.10523, 2021.
- A decision diagram package for reversible and quantum circuit simulation. In 2006 IEEE International Conference on Evolutionary Computation, pages 2428–2435. IEEE, 2006.
- Considering decoherence errors in the simulation of quantum circuits using decision diagrams. In Proceedings of the 39th International Conference on Computer-Aided Design, pages 1–7, 2020.
- Improved simulation of stabilizer circuits. Physical Review A, 70(5):052328, 2004.
- Simulation of quantum circuits by low-rank stabilizer decompositions. Quantum, 3:181, 2019.
- Qiskit backend specifications for openqasm and openpulse experiments. arXiv preprint arXiv:1809.03452, 2018.
- Quantum computer simulation on multi-gpu incorporating data locality. In Algorithms and Architectures for Parallel Processing: 15th International Conference, ICA3PP 2015, Zhangjiajie, China, November 18-20, 2015, Proceedings, Part I 15, pages 241–256. Springer, 2015.
- Uniq: a unified programming model for efficient quantum circuit simulation. In 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 692–707. IEEE Computer Society, 2022.
- Hyquas: hybrid partitioner based quantum circuit simulation system on gpu. In Proceedings of the ACM International Conference on Supercomputing, pages 443–454, 2021.
- Tsung-Wei Huang. qtask: Task-parallel quantum circuit simulation with incrementality. In 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 746–756. IEEE, 2023.