Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 75 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 213 tok/s Pro
GPT OSS 120B 458 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Graph neural networks with configuration cross-attention for tensor compilers (2405.16623v2)

Published 26 May 2024 in cs.LG, cs.AR, and cs.PF

Abstract: With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an AI tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $\tau$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Tensorflow: A system for large-scale machine learning. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), May 2016.
  2. A general framework for counterfactual learning-to-rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, page 5–14, New York, NY, USA, 2019. Association for Computing Machinery.
  3. JAX: composable transformations of Python+NumPy programs, 2018.
  4. Learning large graph property prediction via graph segment training. ArXiv, abs/2305.12322, 2023.
  5. TVM: An automated End-to-End optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 578–594, Carlsbad, CA, October 2018. USENIX Association.
  6. Learning to optimize tensor programs. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  7. Inductive representation learning on large graphs. 31st Conference on Neural Information Processing Systems (NIPS 2017), June 2017.
  8. Squeeze-and-excitation networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7132–7141, 2018.
  9. Taso: optimizing deep learning computation with automatic generation of graph substitutions. SOSP ’19, page 47–62, New York, NY, USA, 2019. Association for Computing Machinery.
  10. Thorsten Joachims. Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, page 133–142, New York, NY, USA, 2002. Association for Computing Machinery.
  11. Deepcuts: a deep learning optimization framework for versatile gpu workloads. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021, page 190–205, New York, NY, USA, 2021. Association for Computing Machinery.
  12. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  13. Quantifying the carbon emissions of machine learning. ArXiv, abs/1910.09700, 2019.
  14. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  15. A learned performance model for the tensor processing unit. Technical report, 2020.
  16. Tpugraphs: A performance prediction dataset on large tensor computational graphs. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
  17. A flexible approach to autotuning multi-pass machine learning compilers. 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 1–16, 2021.
  18. Amit Sabne. Xla: Compiling machine learning for peak performance, 2020.
  19. Operator fusion in xla: Analysis and evaluation. ArXiv, abs/2301.13062, 2023.
  20. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. Facebook AI Research Technical Report., 2018.
  21. PET: Optimizing tensor programs with partially equivalent transformations and automated corrections. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pages 37–54. USENIX Association, July 2021.
  22. Based on tpugraphs predicting model runtimes using graph neural networks. Frontiers in Computing and Intelligent Systems, 6:66–69, November 2023.
  23. nn-meter: Towards accurate latency prediction of deep-learning model inference on diverse edge devices. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services, page 81–93, New York, NY, USA, 2021. ACM.
  24. Ansor: generating high-performance tensor programs for deep learning. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation, OSDI’20, USA, 2020. USENIX Association.

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube