Using Rewrite Strategies for Efficient Functional Automatic Differentiation (2307.02447v2)
Abstract: Automatic Differentiation (AD) has become a dominant technique in ML. AD frameworks have first been implemented for imperative languages using tapes. Meanwhile, functional implementations of AD have been developed, often based on dual numbers, which are close to the formal specification of differentiation and hence easier to prove correct. But these papers have focussed on correctness not efficiency. Recently, it was shown how an approach using dual numbers could be made efficient through the right optimizations. Optimizations are highly dependent on order, as one optimization can enable another. It can therefore be useful to have fine-grained control over the scheduling of optimizations. One method expresses compiler optimizations as rewrite rules, whose application can be combined and controlled using strategy languages. Previous work describes the use of term rewriting and strategies to generate high-performance code in a compiler for a functional language. In this work, we implement dual numbers AD in a functional array programming language using rewrite rules and strategy combinators for optimization. We aim to combine the elegance of differentiation using dual numbers with a succinct expression of the optimization schedule using a strategy language. We give preliminary evidence suggesting the viability of the approach on a micro-benchmark.
- Cartesian differential categories. Theory and Applications of Categories 22, 23 (2009), 622–672.
- JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax
- Categorical Models for Simply Typed Resource Calculi. In Proceedings of the 26th Conference on the Mathematical Foundations of Programming Semantics, MFPS 2010, Ottawa, Ontario, Canada, May 6-10, 2010 (Electronic Notes in Theoretical Computer Science, Vol. 265), Michael W. Mislove and Peter Selinger (Eds.). Elsevier, 213–230. https://doi.org/10.1016/j.entcs.2010.08.013
- Alonzo Church. 1940. A Formulation of the Simple Theory of Types. J. Symb. Log. 5, 2 (1940), 56–68. https://doi.org/10.2307/2266170
- Reverse Derivative Categories. In 28th EACSL Annual Conference on Computer Science Logic, CSL 2020, January 13-16, 2020, Barcelona, Spain (LIPIcs, Vol. 152), Maribel Fernández and Anca Muscholl (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 18:1–18:16. https://doi.org/10.4230/LIPIcs.CSL.2020.18
- Categorical semantics of a simple differential programming language. In Proceedings of the 3rd Annual International Applied Category Theory Conference 2020, ACT 2020, Cambridge, USA, 6-10th July 2020 (EPTCS, Vol. 333), David I. Spivak and Jamie Vicary (Eds.). 289–310. https://doi.org/10.4204/EPTCS.333.20
- Leonardo de Moura and Sebastian Ullrich. 2021. The Lean 4 Theorem Prover and Programming Language. In Automated Deduction - CADE 28 - 28th International Conference on Automated Deduction, Virtual Event, July 12-15, 2021, Proceedings (Lecture Notes in Computer Science, Vol. 12699), André Platzer and Geoff Sutcliffe (Eds.). Springer, 625–635. https://doi.org/10.1007/978-3-030-79876-5_37
- Paulo Emílio de Vilhena and François Pottier. 2021. Verifying an Effect-Handler-Based Define-By-Run Reverse-Mode AD Library. arXiv preprint arXiv:2112.07292 (2021).
- Conal Elliott. 2018. The simple essence of automatic differentiation. Proc. ACM Program. Lang. 2, ICFP (2018), 70:1–70:29. https://doi.org/10.1145/3236765
- Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies. Proc. ACM Program. Lang. 4, ICFP (2020), 92:1–92:29. https://doi.org/10.1145/3408974
- Laurent Hascoët and Valérie Pascual. 2013. The Tapenade automatic differentiation tool: Principles, model, and specification. ACM Trans. Math. Softw. 39, 3 (2013), 20:1–20:43. https://doi.org/10.1145/2450153.2450158
- Futhark: purely functional GPU-programming with nested parallelism and in-place array updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Martin T. Vechev (Eds.). ACM, 556–571. https://doi.org/10.1145/3062341.3062354
- Michael Innes. 2018. Don’t Unroll Adjoint: Differentiating SSA-Form Programs. CoRR abs/1810.07951 (2018). arXiv:1810.07951 http://arxiv.org/abs/1810.07951
- Simon L. Peyton Jones and Simon Marlow. 2002. Secrets of the Glasgow Haskell Compiler inliner. J. Funct. Program. 12, 4&5 (2002), 393–433. https://doi.org/10.1017/S0956796802004331
- Damiano Mazza and Michele Pagani. 2021. Automatic differentiation in PCF. Proc. ACM Program. Lang. 5, POPL (2021), 1–27. https://doi.org/10.1145/3434309
- Samuel Mimram. 2020. PROGRAM = PROOF.
- William S. Moses and Valentin Churavy. 2020. Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/9332c513ef44b682e9347822c2e457ac-Abstract.html
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 8024–8035. https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html
- Getting to the point: index sets and parallelism-preserving autodiff for pointful array programming. Proc. ACM Program. Lang. 5, POPL (2021), 1–29. https://doi.org/10.1145/3473593
- Destination-passing style for efficient memory management. In Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing, FHPC@ICFP 2017, Oxford, UK, September 7, 2017, Phil Trinder and Cosmin E. Oancea (Eds.). ACM, 12–23. https://doi.org/10.1145/3122948.3122949
- Efficient differentiable programming in a functional array-processing language. Proc. ACM Program. Lang. 3, ICFP (2019), 97:1–97:30. https://doi.org/10.1145/3341701
- Eelco Visser. 2005. A survey of strategies in rule-based program transformation systems. J. Symb. Comput. 40, 1 (2005), 831–873. https://doi.org/10.1016/j.jsc.2004.12.011
- Building Program Optimizers with Rewriting Strategies. In Proceedings of the third ACM SIGPLAN International Conference on Functional Programming (ICFP ’98), Baltimore, Maryland, USA, September 27-29, 1998, Matthias Felleisen, Paul Hudak, and Christian Queinnec (Eds.). ACM, 13–26. https://doi.org/10.1145/289423.289425
- Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator. CoRR abs/1803.10228 (2018). arXiv:1803.10228 http://arxiv.org/abs/1803.10228
- Yann LeCun. 2018. Yann LeCun - OK, Deep Learning has outlived its usefulness… — Facebook. https://web.archive.org/web/20180106001630/https://www.facebook.com/yann.lecun/posts/10155003011462143 [Online; accessed 7-April-2022].