GEVO-ML: Optimizing Machine Learning Code with Evolutionary Computation (2310.10211v1)
Abstract: Parallel accelerators, such as GPUs, are key enablers for large-scale Machine Learning (ML) applications. However, ML model developers often lack detailed knowledge of the underlying system architectures, while system programmers usually do not have a high-level understanding of the ML model that runs on the specific system. To mitigate this gap between two relevant aspects of domain knowledge, this paper proposes GEVO-ML, a tool for automatically discovering optimization opportunities and tuning the performance of ML kernels, where the model and training/prediction processes are uniformly represented in a single intermediate language, the Multiple-Layer Intermediate Representation (MLIR). GEVO-ML uses multi-objective evolutionary search to find edits (mutations) to MLIR code that ultimately runs on GPUs, improving performance on desired criteria while retaining required functionality. We demonstrate GEVO-ML on two different ML workloads for both model training and prediction. GEVO-ML finds significant Pareto improvements for these models, achieving 90.43% performance improvement when model accuracy is relaxed by 2%, from 91.2% to 89.3%. For the training workloads, GEVO-ML finds a 4.88% improvement in model accuracy, from 91% to 96%, without sacrificing training or testing speed. Our analysis of key GEVO-ML mutations reveals diverse code modifications, while might be foreign to human developers, achieving similar effects with how human developers improve model design, for example, by changing learning rates or pruning non-essential layer parameters.
- TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI’16). USENIX Association, USA, 265–283.
- AI and Compute. https://openai.com/blog/ai-and-compute/.
- P Anju. 2018. Tips to Improve Performance for Popular Deep Learning Frameworks on CPUs. Intel Developer Zone (2018).
- On Optimizing Machine Learning Workloads via Kernel Fusion. In Proceedings of the 20th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (San Francisco, CA, USA) (PPoPP 2015). Association for Computing Machinery, New York, NY, USA, 173–182. https://doi.org/10.1145/2688500.2688521
- Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2016).
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Proceedings of FAccT (2021).
- Understanding and simplifying one-shot architecture search. In Intl. Conf. on Machine Learning.
- James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, Feb (2012), 281–305.
- Forbes J Burkowski. 1999. Shuffle crossover and mutual information. In Proc. of the 1999 Congress on Evolutionary Computation-CEC99.
- TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proc. of 13th {normal-{\{{USENIX}normal-}\}} Symp. on Operating Systems Design and Implementation.
- Volta: Performance and Programmability. IEEE Micro 38, 2 (2018), 42–52. https://doi.org/10.1109/MM.2018.022071134
- Intel ngraph: An intermediate representation, compiler, and executor for deep learning. arXiv preprint arXiv:1801.08058 (2018).
- DEAP: A Python Framework for Evolutionary Algorithms. In Proc. of the 14th Annual Conf. Companion on Genetic and Evolutionary Computation.
- A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation (2002).
- A Genetic Programming Approach to Automated Software Repair. In Proc. of the 11th Annual Conf. on Genetic and Evolutionary Computation.
- Google. 2019. TensorFlow Performance Guide. https://docs.w3cub.com/tensorflowg̃uide/performance/performance_guide/#general_best_practices. TensorFlow Documentation.
- Google. 2021. IREE: Intermediate Representation Execution Environment. https://google.github.io/iree/.
- Towards the systematic reporting of the energy and carbon footprints of machine learning. Journal of Machine Learning Research 21, 248 (2020), 1–43.
- Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
- A practical guide to support vector classification.
- TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions. In Proc. of the 27th ACM Symp. on Operating Systems Principles (SOSP ’19).
- In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1–12.
- Neural architecture search with bayesian optimisation and optimal transport. In Advances in Neural Information Processing Systems.
- Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report.
- An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation. In Proc. of the 24th Intl. Conf. on Machine Learning.
- Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04). Palo Alto, California.
- MLIR: A compiler infrastructure for the end of Moore’s law. arXiv preprint arXiv:2002.11054 (2020).
- Gradient-based learning applied to document recognition. Proc. of the IEEE (1998).
- C.-Y. Lee and E. K. Antonsson. 2000. Variable Length Genomes for Evolutionary Algorithms. In Proc. of 2nd Annual Conf. on the Genetic and Evolutionary Computation Conf.
- Evolutionary Architecture Search for Deep Multitask Networks. In Proceedings of the Genetic and Evolutionary Computation Conference (Kyoto, Japan) (GECCO ’18). Association for Computing Machinery, New York, NY, USA, 466–473. https://doi.org/10.1145/3205455.3205489
- GEVO: GPU Code Optimization Using Evolutionary Computation. ACM Trans. Archit. Code Optim. 17, 4, Article 33 (Nov. 2020), 28 pages. https://doi.org/10.1145/3418055
- GEVO-ML: A Proposal for Optimizing ML Code with Evolutionary Computation (GECCO ’20). Association for Computing Machinery, New York, NY, USA, 1849–1856. https://doi.org/10.1145/3377929.3398139
- Progressive neural architecture search. In Proc. of the European Conf. on Computer Vision (ECCV).
- Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436 (2017).
- Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).
- Learning efficient convolutional networks through network slimming. In Proc. of the IEEE Intl. Conf. on Computer Vision.
- NSGA-Net: Neural Architecture Search Using Multi-Objective Genetic Algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference (Prague, Czech Republic) (GECCO ’19). Association for Computing Machinery, New York, NY, USA, 419–427. https://doi.org/10.1145/3321707.3321729
- David J Montana and Lawrence Davis. 1989. Training Feedforward Neural Networks Using Genetic Algorithms.. In IJCAI.
- NVIDIA. 2020. NVVM IR Specification. https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html.
- Automatic differentiation in PyTorch. (2017).
- Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268 (2018).
- Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices (2013).
- Regularized evolution for image classifier architecture search. In Proc. of the AAAI Conf. on Artificial Intelligence, Vol. 33. 4780–4789.
- AutoML-Zero: Evolving Machine Learning Algorithms From Scratch. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 8007–8019. http://proceedings.mlr.press/v119/real20a.html
- Large-scale evolution of image classifiers. In Proc. of the 34th Intl. Conf. on Machine Learning-Volume 70. JMLR. org.
- Lower numerical precision deep learning inference and training. Intel White Paper 3 (2018), 1–19.
- Glow: Graph Lowering Compiler Techniques for Neural Networks. arXiv preprint arXiv:1805.00907 (2018).
- Repairing COTS router firmware without access to source code or test suites: A case study in evolutionary software repair. In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation. 847–854.
- A hypercube-based encoding for evolving large-scale neural networks. Artificial life (2009).
- Kenneth O. Stanley and Risto Miikkulainen. 2002. Evolving Neural Networks through Augmenting Topologies. Evolutionary Computation 10, 2 (2002), 99–127.
- D Stathakis. 2009. How many hidden layers and nodes? International Journal of Remote Sensing 30, 8 (2009), 2133–2147.
- Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019).
- TensorFlow. 2018. XLA is a compiler that optimizes TensorFlow computations. https://www.tensorflow.org/xla/.
- TensorFlow. 2020. TFRT: A new TensorFlow runtime. https://blog.tensorflow.org/2020/04/tfrt-new-tensorflow-runtime.html.
- Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. In Proc. of the 19th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD ’13).
- Phillip Verbancsics and Kenneth O Stanley. 2011. Constraining connectivity to encourage modularity in HyperNEAT. In Proc. of the 13th annual Conf. on Genetic and evolutionary computation. ACM.
- Exploiting Parallelism Opportunities with Deep Learning Frameworks. arXiv preprint arXiv:1908.04705 (2019).
- Lingxi Xie and Alan Yuille. 2017. Genetic cnn. In Proc. of the IEEE Intl. Conf. on Computer Vision.
- Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).
- Learning transferable architectures for scalable image recognition. In Proc. of the IEEE Conf. on computer vision and pattern recognition.