Smoothing Methods for Automatic Differentiation Across Conditional Branches (2310.03585v2)
Abstract: Programs involving discontinuities introduced by control flow constructs such as conditional branches pose challenges to mathematical optimization methods that assume a degree of smoothness in the objective function's response surface. Smooth interpretation (SI) is a form of abstract interpretation that approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner. Here, we combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs. In contrast to AD across a regular program execution, these gradients also capture the effects of alternative control flow paths. The combination of SI with AD enables the direct gradient-based parameter synthesis for branching programs, allowing for instance the calibration of simulation models or their combination with neural network models in machine learning pipelines. We detail the effects of the approximations made for tractability in SI and propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling. Using DiscoGrad, our tool for automatically translating simple C++ programs to a smooth differentiable form, we perform an extensive evaluation. We compare the combination of SI with AD and our Monte Carlo estimator to existing gradient-free and stochastic methods on four non-trivial and originally discontinuous problems ranging from classical simulation-based optimization to neural network-driven control. While the optimization progress with the SI-based estimator depends on the complexity of the program's control flow, our Monte Carlo estimator is competitive in all problems, exhibiting the fastest convergence by a substantial margin in our highest-dimensional problem.
- Learning representations by back-propagating errors. Nature, 323(6088):533–536, 1986.
- Algorithm 755: ADOL-C: a package for the automatic differentiation of algorithms written in C/C++. ACM Transactions on Mathematical Software (TOMS), 22(2):131–167, 1996.
- Charles C Margossian. A review of automatic differentiation and its efficient implementation. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(4):e1305, 2019.
- A new approach to the analysis of discrete event dynamic systems. Automatica, 19(2):149–167, 1983.
- Neurosymbolic programming for science. arXiv preprint arXiv:2210.05050, 2022.
- Terpret: A probabilistic programming language for program induction. arXiv preprint arXiv:1608.04428, 2016.
- Philipp Andelfinger. Towards differentiable agent-based simulation. ACM Transactions on Modeling and Computer Simulation (TOMACS), 2022.
- Differentiable rendering: A survey. arXiv preprint arXiv:2006.12057, 2020.
- Bundled gradients through contact via randomized smoothing. IEEE Robotics and Automation Letters, 7(2):4000–4007, 2022.
- Automatic differentiation of programs with discrete randomness. arXiv preprint arXiv:2210.08572, 2022.
- The loss surfaces of multilayer networks. In Artificial Intelligence and Statistics, pages 192–204. PMLR, 2015.
- Smooth interpretation. ACM Sigplan Notices, 45(6):279–291, 2010.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
- Smoothed (conditional) perturbation analysis of discrete event dynamical systems. IEEE Transactions on Automatic Control, 32(10):858–866, 1987.
- J. A. Nelder and R. Mead. A simplex method for function minimization. The Computer Journal, 7(4):308–313, 1965.
- Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pages 238–252, 1977.
- Michael JA Smith. Probabilistic abstract interpretation of imperative programs using truncated normal distributions. Electronic Notes in Theoretical Computer Science, 220(3):43–59, 2008.
- David Monniaux. Abstract interpretation of probabilistic semantics. In International Static Analysis Symposium, pages 322–339. Springer, 2000.
- Essentials of numerical nonsmooth optimization. Annals of Operations Research, 314(1):213–253, 2022.
- Felix Petersen. Learning with differentiable algorithms. arXiv preprint arXiv:2209.00616, 2022.
- Marko Mäkelä. Survey of bundle methods for nonsmooth optimization. Optimization Methods and Software, 17(1):1–29, 2002.
- Gradient sampling methods for nonsmooth optimization. Numerical Nonsmooth Optimization: State of the Art Algorithms, pages 201–225, 2020.
- Michael C. Fu. Chapter 19: Gradient Estimation. In Shane G. Henderson and Barry L. Nelson, editors, Simulation, volume 13 of Handbooks in Operations Research and Management Science, pages 575–616. Elsevier, 2006.
- Pierre L’Ecuyer. An overview of derivative estimation. In Proceedings of the 23rd Conference on Winter Simulation, WSC ’91, page 207–217, USA, 1991. IEEE Computer Society.
- Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17:527–566, 2017.
- B.T. Polyak. Introduction to Optimization. Optimization Software, New York, 1987.
- Gradients without backpropagation. arXiv preprint arXiv:2202.08587, 2022.
- Differentiable programming: Efficient smoothing of control-flow-induced discontinuities. arXiv preprint arXiv:2305.06692, 2023.
- A simple differentiable programming language. Proceedings of the ACM on Programming Languages, 4(POPL):1–28, 2019.
- λssubscript𝜆𝑠\lambda_{s}italic_λ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT: computable semantics for differentiable programming with higher-order functions and datatypes. Proceedings of the ACM on Programming Languages, 5(POPL):1–31, 2021.
- Adev: Sound automatic differentiation of expected values of probabilistic programs. Proceedings of the ACM on Programming Languages, 7(POPL):121–153, 2023.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Pedro H Azevedo de Amorim and Christopher Lam. Distribution theoretic semantics for non-smooth differentiable programming. arXiv preprint arXiv:2207.05946, 2022.
- Differentiable functional program interpreters. arXiv preprint arXiv:1611.01988, 2016.
- Programming with a differentiable forth interpreter. In International Conference on Machine Learning, pages 547–556. PMLR, 2017.
- Differentiable programs with neural libraries. In International Conference on Machine Learning, pages 1213–1222, 2017.
- Neurosymbolic programming. Foundations and Trends in Programming Languages, 7(3):158–243, 2021.
- Safe neurosymbolic learning with differentiable symbolic execution. arXiv preprint arXiv:2203.07671, 2022.
- Learning differentiable programs with admissible neural heuristics. CoRR, abs/2007.12101, 2020.
- Algonet: Cinfinfimum\infroman_inf smooth algorithmic neural networks. CoRR abs/1905.06886, 2019.
- Approximate program smoothing using mean-variance statistics, with application to procedural shader bandlimiting. In Computer Graphics Forum, volume 37, pages 443–454. Wiley Online Library, 2018.
- Russell R Barton. Tutorial: metamodeling for simulation. In 2020 Winter Simulation Conference (WSC), pages 1102–1116. IEEE, 2020.
- Neuzz: Efficient fuzzing with neural program smoothing. In 2019 IEEE Symposium on Security and Privacy (SP), pages 803–817. IEEE, 2019.
- Differentiable monte carlo ray tracing through edge sampling. ACM Transactions on Graphics (TOG), 37(6):1–11, 2018.
- Reparameterizing discontinuous integrands for differentiable rendering. ACM Transactions on Graphics (TOG), 38(6):1–14, 2019.
- Monte carlo estimators for differential light transport. ACM Transactions on Graphics (TOG), 40(4):1–16, 2021.
- Pierre L’ecuyer. A unified view of the IPA, SF, and LR gradient estimation techniques. Management Science, 36(11):1364–1383, 1990.
- Error propagation in computer models: analytic approaches, advantages, disadvantages and constraints. Stochastic Environmental Research and Risk Assessment, 32(10):2971–2985, 2018.
- EULER: A system for numerical optimization of programs. In International Conference on Computer Aided Verification, pages 732–737. Springer, 2012.
- Carlos F Daganzo. The cell transmission model: A dynamic representation of highway traffic consistent with the hydrodynamic theory. Transportation Research Part B: Methodological, 28(4):269–287, 1994.
- SimOpt. https://github.com/simopt-admin/simopt, 2021.
- Simopt: A testbed for simulation-optimization experiments. INFORMS Journal on Computing, 35(2):495–508, 2023.
- Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- ASTRO-DF: A class of adaptive sampling trust-region algorithms for derivative-free stochastic optimization. SIAM Journal on Optimization, 28(4):3145–3176, 2018.
- A survey on optimization metaheuristics. Information Sciences, 237:82–117, 2013.
- Derivative-free optimization methods. Acta Numerica, 28:287–404, 2019.