Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Higher Order Automatic Differentiation of Higher Order Functions (2101.06757v7)

Published 17 Jan 2021 in cs.PL and cs.LO

Abstract: We present semantic correctness proofs of automatic differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Throughout, we show how the analysis extends to AD methods for computing higher order derivatives using a Taylor approximation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265–283, 2016.
  2. Shun-ichi Amari. Differential-geometrical methods in statistics, volume 28. Springer Science & Business Media, 2012.
  3. A simple differentiable programming language. In Proc. POPL 2020. ACM, 2020.
  4. On the versatility of open logical relations: Continuity, automatic differentiation, and a containment theorem. In Proc. ESOP 2020. Springer, 2020. To appear.
  5. Michael Betancourt. A geometric theory of higher-order automatic differentiation. arXiv preprint arXiv:1812.11592, 2018.
  6. Convenient categories of smooth spaces. Transactions of the American Mathematical Society, 363(11):5789–5825, 2011.
  7. Taylor-mode automatic differentiation for higher-order derivatives in JAX. 2019.
  8. Differentiating a tensor language. arXiv preprint arXiv:2008.11256, 2020.
  9. Backpropagation in the simply typed lambda-calculus with linear negation. In Proc. POPL 2020, 2020.
  10. Fadbad, a flexible C++ package for automatic differentiation. Technical report, Technical Report IMM–REP–1996–17, Department of Mathematical Modelling, Technical University of Denmark, Lyngby, 1996.
  11. Tadiff, a flexible c++ package for automatic differentiation. TU of Denmark, Department of Mathematical Modelling, Lungby. Technical report IMM-REP-1997-07, 1997.
  12. Reverse derivative categories. In Proc. CSL 2020, 2020.
  13. Towards formalizing and extending differential programming using tangent categories. In Proc. ACT 2019, 2019.
  14. The Stan math library: Reverse-mode automatic differentiation in C++. arXiv preprint arXiv:1509.07164, 2015.
  15. Curtis Chin Jen Sem. Formalized correctness proofs of automatic differentiation in Coq. Master’s Thesis, Utrecht University, 2020. Thesis: https://dspace.library.uu.nl/handle/1874/400790. Coq code: https://github.com/crtschin/thesis.
  16. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, pages 6571–6583, 2018.
  17. G Constantine and T Savits. A multivariate Faa di Bruno formula with applications. Transactions of the American Mathematical Society, 348(2):503–520, 1996.
  18. The Faa di Bruno construction. Theory and Applications of Categories, 25(15):394–425, 2011.
  19. Tangent spaces and tangent bundles for diffeological spaces. arXiv preprint arXiv:1411.5425, 2014.
  20. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121–2159, 2011.
  21. Conal Elliott. The simple essence of automatic differentiation. Proceedings of the ACM on Programming Languages, 2(ICFP):70, 2018.
  22. A short proof of the generalized Faà di Bruno’s formula. Applied Mathematics Letters, 16(6):975–979, 2003.
  23. The differential lambda-calculus. Theoretical Computer Science, 309(1-3):1–41, 2003.
  24. Compiling machine learning programs via high-level tracing. Systems for Machine Learning, 2018.
  25. Backprop as functor: A compositional perspective on supervised learning. In 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–13. IEEE, 2019.
  26. Calculus of variations. Courier Corporation, 2000.
  27. Evaluating higher derivative tensors by forward propagation of univariate taylor series. Mathematics of Computation, 69(231):1117–1130, 2000.
  28. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1):1593–1623, 2014.
  29. Correctness of automatic differentiation via diffeologies and categorical gluing. In FoSSaCS, pages 319–338, 2020.
  30. Correctness of automatic differentiation via diffeologies and categorical gluing. Full version, 2020. arxiv:2001.02209.
  31. Patrick Iglesias-Zemmour. Diffeology. American Mathematical Soc., 2013.
  32. Quasitoposes, quasiadhesive categories and Artin glueing. In Proc. CALCO 2007, 2007.
  33. An introduction to (co)algebras and (co)induction. In Advanced Topics in Bisimulation and Coinduction, pages 38–99. CUP, 2011.
  34. Jerzy Karczmarczuk. Functional differentiation of computer programs. Higher-Order and Symbolic Computation, 14(1):35–57, 2001.
  35. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  36. Jacobian-free Newton–Krylov methods: a survey of approaches and applications. Journal of Computational Physics, 193(2):357–397, 2004.
  37. The convenient setting of global analysis, volume 53. American Mathematical Soc., 1997.
  38. Anders Kock. Synthetic differential geometry, volume 333. Cambridge University Press, 2006.
  39. Natural operations in differential geometry. 1999.
  40. Automatic differentiation variational inference. The Journal of Machine Learning Research, 18(1):430–474, 2017.
  41. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, 23(3):462–466, 1952.
  42. John M Lee. Smooth manifolds. In Introduction to Smooth Manifolds, pages 1–31. Springer, 2013.
  43. Computing higher order derivatives of matrix and tensor expressions. Advances in Neural Information Processing Systems, 31:2750–2759, 2018.
  44. A simple and efficient tensor calculus. In AAAI, pages 4527–4534, 2020.
  45. On the limited memory BFGS method for large scale optimization. Mathematical programming, 45(1-3):503–528, 1989.
  46. CHAD for expressive total languages. arXiv e-prints, pages arXiv–2110, 2021.
  47. On correctness of automatic differentiation for non-differentiable functions. In Advances in Neural Information Processing Systems, 2020.
  48. Oleksandr Manzyuk. A simply typed λ𝜆\lambdaitalic_λ-calculus of forward automatic differentiation. In Proc. MFPS 2012, 2012.
  49. James Martens. Deep learning via Hessian-free optimization. In ICML, volume 27, pages 735–742, 2010.
  50. Joel Merker. Four explicit formulas for the prolongations of an infinitesimal lie symmetry and multivariate Faa di Bruno formulas. arXiv preprint math/0411650, 2004.
  51. A differential-form pullback programming language for higher-order reverse-mode automatic differentiation. arxiv:2002.08241, 2020.
  52. Automatic differentiation in PCF. Proc. ACM Program. Lang., 5(POPL):1–27, 2021. doi:10.1145/3434309.
  53. Notes on sconing and relators. In International Workshop on Computer Science Logic, pages 352–378. Springer, 1992.
  54. Radford M Neal. MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo, chapter 5. Chapman & Hall / CRC Press, 2011.
  55. Automatic differentiation in pytorch. 2017.
  56. Andrew M Pitts. Categorical logic. Technical report, University of Cambridge, Computer Laboratory, 1995.
  57. Gordon D Plotkin. Some principles of differential programming languages. Invited talk, POPL 2018, 2018.
  58. Lazy multivariate higher-order forward-mode ad. ACM SIGPLAN Notices, 42(1):155–160, 2007.
  59. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Languages and Systems (TOPLAS), 30(2):7, 2008.
  60. Ning Qian. On the momentum term in gradient descent learning algorithms. Neural networks, 12(1):145–151, 1999.
  61. A stochastic approximation method. The Annals of Mathematical Statistics, pages 400–407, 1951.
  62. Thomas H Savits. Some statistical applications of Faa di Bruno. Journal of Multivariate Analysis, 97(10):2131–2140, 2006.
  63. Efficient differentiable programming in a functional array-processing language. Proceedings of the ACM on Programming Languages, 3(ICFP):97, 2019.
  64. λSsubscript𝜆𝑆\lambda_{S}italic_λ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT: Computable semantics for differentiable programming with higher-order functions and datatypes. arXiv preprint arXiv:2007.08017, 2020.
  65. Jean-Marie Souriau. Groupes différentiels. In Differential geometrical methods in mathematical physics, pages 91–128. Springer, 1980.
  66. Andrew Stacey. Comparative smootheology. Theory Appl. Categ., 25(4):64–117, 2011.
  67. Matthijs Vákár. Denotational correctness of foward-mode automatic differentiation for iteration and recursion. arXiv preprint arXiv:2007.05282, 2020.
  68. Matthijs Vákár. Reverse AD at higher types: Pure, principled and denotationally correct. In ESOP, pages 607–634, 2021.
  69. Automatic differentiation in ML: Where we are and where we should be going. In Advances in Neural Information Processing Systems, pages 8757–8767, 2018.
  70. CHAD: Combinatory homomorphic automatic differentiation. arXiv preprint arXiv:2103.15776, 2021.
  71. Capitalizing on live variables: new algorithms for efficient hessian computation via automatic differentiation. Mathematical Programming Computation, 8(4):393–433, 2016.
  72. Demystifying differentiable programming: Shift/reset the penultimate backpropagator. Proceedings of the ACM on Programming Languages, 3(ICFP), 2019.
  73. On the principles of differentiable quantum programming languages. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, pages 272–285. ACM, 2020. doi:10.1145/3385412.3386011.
Citations (15)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com