TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs (2308.13490v3)
Abstract: Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the autotuner for XLA, a machine learning compiler, discovered 10-20% speedup on state-of-the-art models serving substantial production traffic at Google. Although there exist a few datasets for program performance prediction, they target small sub-programs such as basic blocks or kernels. This paper introduces TpuGraphs, a performance prediction dataset on full tensor programs, represented as computational graphs, running on Tensor Processing Units (TPUs). Each graph in the dataset represents the main computation of a machine learning workload, e.g., a training epoch or an inference step. Each data sample contains a computational graph, a compilation configuration, and the execution time of the graph when compiled with the configuration. The graphs in the dataset are collected from open-source machine learning programs, featuring popular model architectures, e.g., ResNet, EfficientNet, Mask R-CNN, and Transformer. TpuGraphs provides 25x more graphs than the largest graph property prediction dataset (with comparable graph sizes), and 770x larger graphs on average compared to existing performance prediction datasets on machine learning programs. This graph-level prediction task on large graphs introduces new challenges in learning, ranging from scalability, training efficiency, to model quality.
- Submix: Learning to mix graph sampling heuristics. In Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, 2023.
- Learning to Optimize Halide with Tree Search and Random Programs. ACM Trans. Graph., 38(4):121:1–121:12, July 2019. ISSN 0730-0301. doi: 10.1145/3306346.3322967. URL http://doi.acm.org/10.1145/3306346.3322967.
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation. In International Conference on Learning Representations, 2020.
- Miltiadis Allamanis. The adverse effects of code duplication in machine learning models of code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, pages 143–153, 2019.
- A Deep Learning Based Cost Model for Automatic Code Optimization. In Proceedings of MLSys, 2021.
- The gap benchmark suite, 2017.
- Neural code comprehension: A learnable representation of code semantics. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 3589–3601, Red Hook, NY, USA, 2018. Curran Associates Inc.
- Is pagerank all you need for scalable graph neural networks. In ACM KDD, MLG Workshop, 2019.
- Scaling graph neural networks with approximate pagerank. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2464–2473, 2020.
- Learning large graph property prediction via graph segment training. In International Conference on Neural Information Processing Systems, NeurIPS’23, 2023.
- Machine learning on graphs: A model and comprehensive taxonomy. Journal of Machine Learning Research, 23(89):1–64, 2022.
- Fastgcn: Fast learning with graph convolutional networks via importance sampling. In International Conference on Learning Representations, 2018a.
- TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, OSDI ’18, page 579–594, USA, 2018b. USENIX Association. ISBN 9781931971478.
- Learning to Optimize Tensor Programs. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NeurIPS’18, page 3393–3404, Red Hook, NY, USA, 2018c. Curran Associates Inc.
- Bhive: A benchmark suite and measurement framework for validating x86-64 basic block performance models. In 2019 IEEE International Symposium on Workload Characterization (IISWC), pages 167–177, Los Alamitos, CA, USA, nov 2019. IEEE Computer Society. doi: 10.1109/IISWC47752.2019.9042166. URL https://doi.ieeecomputersociety.org/10.1109/IISWC47752.2019.9042166.
- Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555, 2020.
- Programl: Graph-based deep learning for program optimization and analysis, 2020.
- CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research. In CGO, 2022.
- Anghabench: A suite with one million compilable c benchmarks for code-size reduction. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 378–390, 2021. doi: 10.1109/CGO51591.2021.9370322.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- The graph based benchmark suite (gbbs). In Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences, Systems (GRADES) and Network Data Analytics (NDA), 2020.
- Distinguishing enzyme structures from non-enzymes without alignments. Journal of Molecular Biology, 330(4):771–783, 2003. ISSN 0022-2836. doi: https://doi.org/10.1016/S0022-2836(03)00628-4. URL https://www.sciencedirect.com/science/article/pii/S0022283603006284.
- Xuanyi Dong and Yi Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. 2018. URL http://arxiv.org/abs/2001.00326.
- Fast Compiler Optimisation Evaluation Using Code-feature Based Performance Prediction. In Proceedings of the 4th International Conference on Computing Frontiers, CF ’07, 2007.
- CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1536–1547, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.139. URL https://aclanthology.org/2020.findings-emnlp.139.
- Tf-gnn: Graph neural networks in tensorflow, 2023.
- A large-scale database for graph representation learning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021.
- GCC. Auto-Vectorization in GCC. https://www.gnu.org/software/gcc/projects/tree-ssa/vectorization.html, August 2019. [Online; last modified 18-August-2019].
- Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017a.
- Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems, 2017b.
- Zero-shot cost models for out-of-the-box learned cost prediction. 15(11):2361–2374, jul 2022. ISSN 2150-8097. doi: 10.14778/3551793.3551799. URL https://doi.org/10.14778/3551793.3551799.
- Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020.
- Ogb-lsc: A large-scale challenge for machine learning on graphs. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- Adaptive sampling towards fast graph representation learning. Advances in neural information processing systems, 31, 2018.
- Mlperf mobile inference benchmark: An industry-standard open-source machine learning benchmark for on-device ai. In D. Marculescu, Y. Chi, and C. Wu, editors, Proceedings of Machine Learning and Systems, volume 4, pages 352–369, 2022. URL https://proceedings.mlsys.org/paper_files/paper/2022/file/7eabe3a1649ffa2b3ff8c02ebfd5659f-Paper.pdf.
- TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP ’19, page 47–62, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450368735. doi: 10.1145/3341301.3359630.
- Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc. In I. Dhillon, D. Papailiopoulos, and V. Sze, editors, Proceedings of MLSys Conference, volume 2, pages 187–198, 2020.
- In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA ’17, 2017.
- A domain-specific supercomputer for training deep neural networks. Commun. ACM, 63(7):67–78, June 2020. ISSN 0001-0782. doi: 10.1145/3360307. URL https://doi.org/10.1145/3360307.
- Learning and evaluating contextual embedding of source code. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
- A learned performance model for tensor processing units. In Proceedings of Machine Learning and Systems, 2021.
- Semi-supervised classification with graph convolutional networks. 2016.
- The suitesparse matrix collection website interface. Journal of Open Source Software, 4(35):1244, 2019. doi: 10.21105/joss.01244. URL https://doi.org/10.21105/joss.01244.
- Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.
- AdaTune: Adaptive Tensor Program Compilation Made Efficient. In 34th Conference on Neural Information Processing Systems, NeurIPS’20, 2020.
- Progressive neural architecture search. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors, Computer Vision – ECCV. Springer International Publishing, 2018a.
- Hierarchical representations for efficient architecture search. In International Conference on Learning Representations, ICLR’18, 2018b. URL https://openreview.net/forum?id=BJQRKzbA-.
- LLVM. Auto-Vectorization in LLVM. https://bcain-llvm.readthedocs.io/projects/llvm/en/latest/Vectorizers. [Online; accessed 03-Feb-2020].
- Graph traversal with tensor functionals: A meta-algorithm for scalable learning. In International Conference on Learning Representations, 2021.
- Mlperf training benchmark. In I. Dhillon, D. Papailiopoulos, and V. Sze, editors, Proceedings of Machine Learning and Systems, volume 2, pages 336–349, 2020. URL https://proceedings.mlsys.org/paper_files/paper/2020/file/02522a2b2726fb0a03bb19f2d8d9524d-Paper.pdf.
- Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML, 2019.
- Convolutional neural networks over tree structures for programming language processing. AAAI’16, page 1287–1293. AAAI Press, 2016.
- Pipedream: Generalized pipeline parallelism for dnn training. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP ’19, page 1–15, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450368735. doi: 10.1145/3341301.3359646. URL https://doi.org/10.1145/3341301.3359646.
- Google - fast or slow? predict ai model runtime, 2023. URL https://kaggle.com/competitions/predict-ai-model-runtime.
- Google Research, 2022 & beyond: ML & computer systems. https://ai.googleblog.com/2023/02/google-research-2022-beyond-ml-computer.html, Feb 2023.
- A flexible approach to autotuning multi-pass machine learning compilers. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 1–16, 2021. doi: 10.1109/PACT52795.2021.00008.
- Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks, 2021.
- Massively multitask networks for drug discovery. arXiv:1502.02072, 02 2015.
- Probabilistic model for code with decision trees. SIGPLAN Not., 51(10):731–747, oct 2016. ISSN 0362-1340. doi: 10.1145/3022671.2984041. URL https://doi.org/10.1145/3022671.2984041.
- Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, volume 33, pages 4780–4789, 2019.
- Maximum unbiased validation (muv) data sets for virtual screening based on pubchem bioactivity data. Journal of Chemical Information and Modeling, 49(2):169–184, 2009. doi: 10.1021/ci8002649. PMID: 19161251.
- Karate club: An api oriented open-source python framework for unsupervised learning on graphs. CIKM ’20, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450368599. doi: 10.1145/3340531.3412757. URL https://doi.org/10.1145/3340531.3412757.
- Talking-heads attention. arXiv preprint arXiv:2003.02436, 2020.
- Searching for efficient transformers for language modeling. In Advances in Neural Information Processing Systems, 2021.
- Value Learning for Throughput Optimization of Deep Learning Workloads. In Proceedings of MLSys Conference, 2021.
- Granite: A graph neural network model for basic block throughput estimation. In 2022 IEEE International Symposium on Workload Characterization (IISWC), pages 14–26, Los Alamitos, CA, USA, nov 2022. IEEE Computer Society. doi: 10.1109/IISWC55918.2022.00012. URL https://doi.ieeecomputersociety.org/10.1109/IISWC55918.2022.00012.
- String v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic acids research, 47(D1):D607–D613, 2019.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of International Conference on International Conference on Machine Learning, ICML’19, 2019.
- Efficientnetv2: Smaller models and faster training. In Proceedings of International Conference on International Conference on Machine Learning, ICML’21, 2021.
- TensorFlow. XLA: Optimizing Compiler for TensorFlow. https://www.tensorflow.org/xla. [Online; accessed 19-September-2019].
- Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. arXiv preprint arXiv:1802.04730, 2018.
- Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
- Listwise approach to learning to rank: Theory and algorithm. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, page 1192–1199, New York, NY, USA, 2008. Association for Computing Machinery. ISBN 9781605582054. doi: 10.1145/1390156.1390306. URL https://doi.org/10.1145/1390156.1390306.
- Deep graph kernels. KDD ’15, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450336642. doi: 10.1145/2783258.2783417. URL https://doi.org/10.1145/2783258.2783417.
- Finding and understanding bugs in c compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, page 283–294, New York, NY, USA, 2011. Association for Computing Machinery. ISBN 9781450306638. doi: 10.1145/1993498.1993532. URL https://doi.org/10.1145/1993498.1993532.
- Nas-bench-101: Towards reproducible neural architecture search. In Proceedings of International Conference on International Conference on Machine Learning, ICML’19, 2019. URL http://arxiv.org/abs/1902.09635.
- Graphsaint: Graph sampling based inductive learning method. In International Conference on Learning Representations, 2019.
- Understanding gnn computational graph: A coordinated computation, io, and memory perspective. Proceedings of Machine Learning and Systems, 4:467–484, 2022.
- Ansor: Generating High-Performance Tensor Programs for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI ’20, pages 863–879. USENIX Association, November 2020a. ISBN 978-1-939133-19-9.
- Tenset: A large-scale program performance dataset for learned tensor compilers. In J. Vanschoren and S. Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1. Curran, 2021a. URL https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/a684eceee76fc522773286a895bc8436-Paper-round1.pdf.
- Tenset: A large-scale program performance dataset for learned tensor compilers. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021b. URL https://openreview.net/forum?id=aIfp8kLuvc9.
- FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’20, page 859–873, New York, NY, USA, 2020b. Association for Computing Machinery. ISBN 9781450371025. doi: 10.1145/3373376.3378508.
- Evolution of resilience in protein interactomes across the tree of life. Proceedings of the National Academy of Sciences, 116(10):4426–4433, 2019.
- Neural architecture search with reinforcement learning. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=r1Ue8Hcxg.
- Learning transferable architectures for scalable image recognition. In Conference on Computer Vision and Pattern Recognition, CVPR’18, pages 8697–8710, 06 2018. doi: 10.1109/CVPR.2018.00907.
- Layer-dependent importance sampling for training deep and large graph convolutional networks. Advances in neural information processing systems, 32, 2019.