Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection (2303.08989v2)
Abstract: Quantum circuit simulation provides the foundation for the development of quantum algorithms and the verification of quantum supremacy. Among the various methods for quantum circuit simulation, tensor network contraction has been increasing in popularity due to its ability to simulate a larger number of qubits. During tensor contraction, the input tensors are reshaped to matrices and computed by a GEMM operation, where these GEMM operations could reach up to 90\% of the total calculation time. GEMM throughput can be improved by utilizing mixed-precision hardware such as Tensor Cores, but straightforward implementation results in insufficient fidelity for deep and large quantum circuits. Prior work has demonstrated that compensated summation with special care of the rounding mode can fully recover the FP32 precision of SGEMM even when using TF32 or FP16 Tensor Cores. The exponent range is a critical issue when applying such techniques to quantum circuit simulation. While TF32 supports almost the same exponent range as FP32, FP16 supports a much smaller exponent range. In this work, we use the exponent range statistics of input tensor elements to select which Tensor Cores we use for the GEMM. We evaluate our method on Random Circuit Sampling (RCS), including Sycamore's quantum circuit, and show that the throughput is 1.86 times higher at maximum while maintaining accuracy.
- Quantum supremacy using a programmable superconducting processor. Nature, 574(7779):505–510, October 2019. ISSN 1476-4687. doi:10.1038/s41586-019-1666-5. URL https://www.nature.com/articles/s41586-019-1666-5. Number: 7779 Publisher: Nature Publishing Group.
- Characterizing quantum supremacy in near-term devices. Nature Physics, 14(6):595--600, June 2018. ISSN 1745-2481. doi:10.1038/s41567-018-0124-x. URL https://www.nature.com/articles/s41567-018-0124-x. Bandiera_abtest: a Cg_type: Nature Research Journals Number: 6 Primary_atype: Research Publisher: Nature Publishing Group Subject_term: Quantum information;Quantum simulation Subject_term_id: quantum-information;quantum-simulation.
- 64-qubit quantum circuit simulation. Science Bulletin, 63(15):964--971, August 2018. ISSN 2095-9273. doi:10.1016/j.scib.2018.06.007. URL https://www.sciencedirect.com/science/article/pii/S2095927318302809.
- On Optimizing a Class of Multi-Dimensional Loops with Reduction for Parallel Execution. Parallel Processing Letters, 07(02):157--168, June 1997. ISSN 0129-6264. doi:10.1142/S0129626497000176.
- Opt\_einsum - A Python package for optimizing contraction order for einsum-like expressions. Journal of Open Source Software, 3(26):753, June 2018. ISSN 2475-9066. doi:10.21105/joss.00753.
- Johnnie Gray. quimb: A python package for quantum information and many-body calculations. Journal of Open Source Software, 3(29):819, September 2018. ISSN 2475-9066. doi:10.21105/joss.00819. URL https://joss.theoj.org/papers/10.21105/joss.00819.
- Hyper-optimized tensor network contraction. Quantum, 5:410, March 2021. ISSN 2521-327X. doi:10.22331/q-2021-03-15-410. URL http://arxiv.org/abs/2002.01935. arXiv: 2002.01935.
- Intel Quantum Simulator: a cloud-ready high-performance simulator of quantum circuits. Quantum Science and Technology, 5(3):034007, May 2020. ISSN 2058-9565. doi:10.1088/2058-9565/ab8505. URL https://doi.org/10.1088/2058-9565/ab8505. Publisher: IOP Publishing.
- Efficient parallelization of tensor network contraction for simulating quantum computation. Nature Computational Science, 1(9):578--587, September 2021. ISSN 2662-8457. doi:10.1038/s43588-021-00119-7.
- Implementing Strassen’s Algorithm with CUTLASS on NVIDIA Volta GPUs. arXiv:1808.07984 [cs], August 2018. URL http://arxiv.org/abs/1808.07984. arXiv: 1808.07984.
- QuEST and High Performance Simulation of Quantum Computers. Scientific Reports, 9(1):10736, July 2019. ISSN 2045-2322. doi:10.1038/s41598-019-47174-9. URL https://www.nature.com/articles/s41598-019-47174-9. Number: 1 Publisher: Nature Publishing Group.
- Fast Search of the Optimal Contraction Sequence in Tensor Networks. IEEE Journal of Selected Topics in Signal Processing, 15(3):574--586, April 2021. ISSN 1941-0484. doi:10.1109/JSTSP.2021.3051231. Conference Name: IEEE Journal of Selected Topics in Signal Processing.
- Closing the "quantum supremacy" gap: achieving real-time simulation of a random quantum circuit using a new Sunway supercomputer. pages 1--12, November 2021. doi:10.1145/3458817.3487399. URL https://doi.org/10.1145/3458817.3487399.
- NVIDIA Tensor Core Programmability, Performance & Precision. 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 522--531, May 2018. doi:10.1109/IPDPSW.2018.00091. URL http://arxiv.org/abs/1803.04014. arXiv: 1803.04014.
- Simulating Quantum Computation by Contracting Tensor Networks. SIAM Journal on Computing, 38(3):963--981, January 2008. ISSN 0097-5397. doi:10.1137/050644756. URL https://epubs.siam.org/doi/10.1137/050644756. Publisher: Society for Industrial and Applied Mathematics.
- Quantum Supremacy Is Both Closer and Farther than It Appears. arXiv:1807.10749 [quant-ph], September 2018. URL http://arxiv.org/abs/1807.10749. arXiv: 1807.10749.
- Tensor Network Quantum Virtual Machine for Simulating Quantum Circuits at Exascale. arXiv:2104.10523 [quant-ph], April 2021. URL http://arxiv.org/abs/2104.10523. arXiv: 2104.10523.
- CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations. 2017.
- Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance:. The International Journal of High Performance Computing Applications, June 2022. doi:10.1177/10943420221090256. URL https://journals.sagepub.com/eprint/C6XUQKVBS3PU5SXTTFIJ/full. Publisher: SAGE PublicationsSage UK: London, England.
- Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library. pages 1--8, February 2023. doi:10.1145/3578178.3578238. URL https://doi.org/10.1145/3578178.3578238.
- Simulation of Quantum Circuits Using the Big-Batch Tensor Network Method. Physical Review Letters, 128(3):030501, January 2022. doi:10.1103/PhysRevLett.128.030501.
- Solving the sampling problem of the Sycamore quantum circuits. Physical Review Letters, 129(9):090502, August 2022. ISSN 0031-9007, 1079-7114. doi:10.1103/PhysRevLett.129.090502. arXiv:2111.03011 [physics, physics:quant-ph].
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://papers.nips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html.
- John Preskill. Quantum computing and the entanglement frontier, November 2012.
- TensorNetwork: A Library for Physics and Machine Learning, May 2019.
- High-Quality Hypergraph Partitioning. ACM Journal of Experimental Algorithmics, March 24, 2022. ISSN 1084-6654. doi:10.1145/3529090.
- The computational complexity of PEPS. Physical Review Letters, 98(14):140506, April 2007. ISSN 0031-9007, 1079-7114. doi:10.1103/PhysRevLett.98.140506. arXiv:quant-ph/0611050.
- Qulacs: a fast and versatile quantum circuit simulator for research purpose. Quantum, 5:559, October 2021. doi:10.22331/q-2021-10-06-559. URL https://quantum-journal.org/papers/q-2021-10-06-559/. Publisher: Verein zur Förderung des Open Access Publizierens in den Quantenwissenschaften.
- Qiskit/qiskit: Qiskit 0.38.0, September 2022. URL https://zenodo.org/record/2573505.
- Establishing the quantum supremacy frontier with a 281 Pflop/s simulation. Quantum Science and Technology, 5(3):034003, April 2020. ISSN 2058-9565. doi:10.1088/2058-9565/ab7eeb. URL https://doi.org/10.1088/2058-9565/ab7eeb. Publisher: IOP Publishing.