Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization (2405.01731v1)
Abstract: We propose a novel algorithm that extends the methods of ball smoothing and Gaussian smoothing for noisy derivative-free optimization by accounting for the heterogeneous curvature of the objective function. The algorithm dynamically adapts the shape of the smoothing kernel to approximate the Hessian of the objective function around a local optimum. This approach significantly reduces the error in estimating the gradient from noisy evaluations through sampling. We demonstrate the efficacy of our method through numerical experiments on artificial problems. Additionally, we show improved performance when tuning NP-hard combinatorial optimization solvers compared to existing state-of-the-art heuristic derivative-free and Bayesian optimization methods.
- Optimal algorithms for online convex optimization with multi-point bandit feedback. In Annual Conference Computational Learning Theory, 2010. URL https://api.semanticscholar.org/CorpusID:118314530.
- Optuna: A next-generation hyperparameter optimization framework, 2019.
- A gender-based genetic algorithm for the automatic configuration of algorithms. In International Conference on Principles and Practice of Constraint Programming, pp. 142–157. Springer, 2009.
- Model-based genetic algorithms for algorithm configuration. In IJCAI, pp. 733–739, 2015.
- Nelder-mead simplex modifications for simulation optimization. Management Science, 42(7):954–973, 1996. doi: 10.1287/mnsc.42.7.954. URL https://doi.org/10.1287/mnsc.42.7.954.
- A theoretical and empirical comparison of gradient approximations in derivative-free optimization. Foundations of Computational Mathematics, 22(2):507–560, 2022.
- Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Dasgupta, S. and McAllester, D. (eds.), Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pp. 115–123, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR. URL https://proceedings.mlr.press/v28/bergstra13.html.
- Bernal, D. Window sticker - stochastic benchmark. https://github.com/usra-riacs/stochastic-benchmark, 2021.
- F-race and iterated f-race: An overview. Experimental methods for the analysis of optimization algorithms, pp. 311–336, 2010.
- Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. WIREs Data Mining and Knowledge Discovery, 13(2):e1484, 2023. doi: https://doi.org/10.1002/widm.1484. URL https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/widm.1484.
- Boettcher, S. Simulations of ground state fluctuations in mean-field ising spin glasses. Journal of Statistical Mechanics: Theory and Experiment, 2010(07):P07002, jul 2010. doi: 10.1088/1742-5468/2010/07/P07002. URL https://dx.doi.org/10.1088/1742-5468/2010/07/P07002.
- Learning supervised pagerank with gradient-based and gradient-free optimization methods. Advances in neural information processing systems, 29, 2016.
- Adaptive sampling quasi-newton methods for derivative-free stochastic optimization. arXiv preprint arXiv:1910.13516, 2019a.
- Adaptive sampling quasi-newton methods for derivative-free stochastic optimization. arXiv preprint arXiv:1910.13516, 2019b.
- Practical gauss-newton optimisation for deep learning. In International Conference on Machine Learning, pp. 557–565. PMLR, 2017.
- Convex optimization. Cambridge university press, 2004.
- Symbolic discovery of optimization algorithms. arXiv preprint arXiv:2302.06675, 2023.
- Optimizing simulations with noise-tolerant structured exploration. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2970–2977. IEEE, 2018.
- Trust Region Methods. MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104), 2000. ISBN 9780898719857. URL https://books.google.com/books?id=wfs-hsrd4WQC.
- Introduction to derivative-free optimization. SIAM, 2009.
- Gptune v.1, 8 2019. URL https://www.osti.gov//servlets/purl/1569031.
- Adaptation of the uobyqa algorithm for noisy functions. In Proceedings of the 38th Conference on Winter Simulation, WSC ’06, pp. 312–319. Winter Simulation Conference, 2006. ISBN 1424405017.
- Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Twenty-fourth international joint conference on artificial intelligence, 2015.
- Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788–2806, 2015.
- Optimization hardness as transient chaos in an analog approach to constraint satisfaction. Nature Physics, 7(12):966–970, 2011. doi: 10.1038/nphys2105. URL https://doi.org/10.1038/nphys2105.
- BOHB: Robust and efficient hyperparameter optimization at scale. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 1437–1446. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/falkner18a.html.
- Frazier, P. I. A tutorial on bayesian optimization, 2018.
- Fuchs, T. Automated tuning and portfolio selection for sat solvers. Master’s thesis, Karlsruhe Institute of Technology, 2023.
- On the promise of the stochastic generalized gauss-newton method for training dnns. arXiv preprint arXiv:2006.02409, 2020.
- Randomized gradient-free methods in convex optimization, 2022a.
- The power of first-order smooth optimization for black-box non-smooth problems. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 7241–7265. PMLR, 17–23 Jul 2022b. URL https://proceedings.mlr.press/v162/gasnikov22a.html.
- Combinatorial optimization by simulating adiabatic bifurcations in nonlinear hamiltonian systems. Science Advances, 5(4):eaav2372, 2019. doi: 10.1126/sciadv.aav2372. URL https://www.science.org/doi/abs/10.1126/sciadv.aav2372.
- High-performance combinatorial optimization based on classical mechanics. Science Advances, 7(6):eabe7953, 2021. doi: 10.1126/sciadv.abe7953. URL https://www.science.org/doi/abs/10.1126/sciadv.abe7953.
- Chapter 12. Automated Configuration and Selection of SAT Solvers. IOS Press, February 2021. doi: 10.3233/faia200995. URL http://dx.doi.org/10.3233/FAIA200995.
- Automatic algorithm configuration based on local search. 01 2007.
- Paramils: an automatic algorithm configuration framework. Journal of artificial intelligence research, 36:267–306, 2009.
- Sequential model-based optimization for general algorithm configuration. In Coello, C. A. C. (ed.), Learning and Intelligent Optimization, pp. 507–523, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. ISBN 978-3-642-25566-3.
- Query complexity of derivative-free optimization. Advances in Neural Information Processing Systems, 25, 2012.
- Analog iterative machine (aim): using light to solve quadratic optimization problems with mixed variables, 2023.
- Karp, R. M. Reducibility among combinatorial problems. Springer, 2010.
- Satenstein: Automatically building local search sat solvers from components. volume 232, pp. 517–524, 01 2009. doi: 10.1016/j.artint.2015.11.002.
- Satenstein: Automatically building local search sat solvers from components. Artificial Intelligence, 232:20–42, 2016.
- Convergence properties of direct search methods for stochastic optimization. In Proceedings of the 2010 Winter Simulation Conference, pp. 1003–1011, 2010. doi: 10.1109/WSC.2010.5679089.
- Adam: A method for stochastic optimization, 2017.
- Optimization by direct search: New perspectives on some classical and modern methods. SIAM Review, 45(3):385–482, 2003. doi: 10.1137/S003614450242889. URL https://doi.org/10.1137/S003614450242889.
- Noise is not the main factor behind the gap between sgd and adam on transformers, but sign descent might be. arXiv preprint arXiv:2304.13960, 2023.
- An optimal randomized incremental gradient method. Mathematical programming, 171:167–215, 2018.
- Derivative-free optimization methods. Acta Numerica, 28:287–404, 2019. doi: 10.1017/S0962492919000060.
- Destabilization of local minima in analog spin systems by correction of amplitude heterogeneity. Physical Review Letters, 122, 02 2019. doi: 10.1103/PhysRevLett.122.040607.
- Scaling advantage of chaotic amplitude control for high-performance combinatorial optimization. Communications Physics, 4(1):266, 2021. doi: 10.1038/s42005-021-00768-0. URL https://doi.org/10.1038/s42005-021-00768-0.
- An effective heuristic algorithm for the traveling-salesman problem. Oper. Res., 21:498–516, 1973. URL https://api.semanticscholar.org/CorpusID:33245458.
- Sophia: A scalable stochastic second-order optimizer for language model pre-training. arXiv preprint arXiv:2305.14342, 2023.
- Understanding the difficulty of training transformers. arXiv preprint arXiv:2004.08249, 2020.
- Zeroth-order stochastic variance reduction for nonconvex optimization. Advances in Neural Information Processing Systems, 31, 2018.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- A Simplex Method for Function Minimization. The Computer Journal, 7(4):308–313, 01 1965. ISSN 0010-4620. doi: 10.1093/comjnl/7.4.308. URL https://doi.org/10.1093/comjnl/7.4.308.
- Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17:527–566, 2017. URL https://api.semanticscholar.org/CorpusID:2147817.
- Iterative solution of nonlinear equations in several variables. SIAM, 2000.
- Parisi, G. The order parameter for spin glasses: a function on the interval 0-1. Journal of Physics A: Mathematical and General, 13(3):1101, mar 1980. doi: 10.1088/0305-4470/13/3/042. URL https://dx.doi.org/10.1088/0305-4470/13/3/042.
- Fast hyperparameter tuning for ising machines. In 2023 IEEE International Conference on Consumer Electronics (ICCE), pp. 1–6, 2023. doi: 10.1109/ICCE56470.2023.10043382.
- Pincus, M. Letter to the editor—a monte carlo method for the approximate solution of certain types of constrained optimization problems. Operations Research, 18(6):1225–1228, 1970. doi: 10.1287/opre.18.6.1225. URL https://doi.org/10.1287/opre.18.6.1225.
- Coherent ising machines with optical error correction circuits. Advanced Quantum Technologies, 4(11):2100077, 2021. doi: https://doi.org/10.1002/qute.202100077. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/qute.202100077.
- Coherent sat solvers: a tutorial. Adv. Opt. Photon., 15(2):385–441, Jun 2023. doi: 10.1364/AOP.475823. URL https://opg.optica.org/aop/abstract.cfm?URI=aop-15-2-385.
- A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22(3):400 – 407, 1951. doi: 10.1214/aoms/1177729586. URL https://doi.org/10.1214/aoms/1177729586.
- Eigenvalues of the hessian in deep learning: Singularity and beyond. arXiv preprint arXiv:1611.07476, 2016.
- Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864, 2017.
- No more pesky learning rates. In International conference on machine learning, pp. 343–351. PMLR, 2013.
- Combinatorial optimization with physics-inspired graph neural networks. Nature Machine Intelligence, 4(4):367–377, 2022.
- Trust region policy optimization. In International conference on machine learning, pp. 1889–1897. PMLR, 2015.
- A new method for solving hard satisfiability problems. In Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI’92, pp. 440–446. AAAI Press, 1992. ISBN 0262510634.
- Learning a sat solver from single-bit supervision, 2019.
- Shamir, O. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. ArXiv, abs/1507.08752, 2015. URL https://api.semanticscholar.org/CorpusID:2541603.
- Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25, 2012.
- Spall, J. Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Transactions on Aerospace and Electronic Systems, 34(3):817–823, 1998. doi: 10.1109/7.705889.
- A trust region method for the optimization of noisy functions, 2022.
- Neural deep operator networks representation of coherent ising machine dynamics. In Machine Learning with New Compute Paradigms, 2023.
- Coherent ising machine based on degenerate optical parametric oscillators. Phys. Rev. A, 88:063853, Dec 2013. doi: 10.1103/PhysRevA.88.063853. URL https://link.aps.org/doi/10.1103/PhysRevA.88.063853.
- Coherent ising machines—optical neural networks operating at the quantum limit. npj Quantum Information, 3(1):49, 2017. doi: 10.1038/s41534-017-0048-9. URL https://doi.org/10.1038/s41534-017-0048-9.
- Pyhessian: Neural networks through the lens of the hessian. In 2020 IEEE international conference on big data (Big data), pp. 581–590. IEEE, 2020.
- Adahessian: An adaptive second order optimizer for machine learning. In proceedings of the AAAI conference on artificial intelligence, volume 35, pp. 10665–10673, 2021.