Offline Model-Based Optimization via Policy-Guided Gradient Search (2405.05349v1)
Abstract: Offline optimization is an emerging problem in many experimental engineering domains including protein, drug or aircraft design, where online experimentation to collect evaluation data is too expensive or dangerous. To avoid that, one has to optimize an unknown function given only its offline evaluation at a fixed set of inputs. A naive solution to this problem is to learn a surrogate model of the unknown function and optimize this surrogate instead. However, such a naive optimizer is prone to erroneous overestimation of the surrogate (possibly due to over-fitting on a biased sample of function evaluation) on inputs outside the offline dataset. Prior approaches addressing this challenge have primarily focused on learning robust surrogate models. However, their search strategies are derived from the surrogate model rather than the actual offline data. To fill this important gap, we introduce a new learning-to-search perspective for offline optimization by reformulating it as an offline reinforcement learning problem. Our proposed policy-guided gradient search approach explicitly learns the best policy for a given surrogate model created from the offline data. Our empirical results on multiple benchmarks demonstrate that the learned optimization policy can be combined with existing offline surrogates to significantly improve the optimization performance.
- Robel: Robotics benchmarks for learning with low-cost robots. In Conference on robot learning, 1300–1313. PMLR.
- Learning to learn by gradient descent by gradient descent. Advances in neural information processing systems, 29.
- Openai gym. arXiv preprint arXiv:1606.01540.
- Conditioning by adaptive sample2019ling for robust design. CoRR, abs/1901.10060.
- Bidirectional Learning for Offline Infinite-width Model-based Optimization. CoRR, abs/2209.07507.
- Search-based structured prediction. Machine learning, 75: 297–325.
- Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces. In NeurIPS, 8185–8200.
- MOOS: A Multi-Objective Design Space Exploration and Optimization Framework for NoC enabled Manycore Systems. ACM TECS.
- Bayesian optimization of nanoporous materials. Molecular Systems Design & Engineering, 6(12): 1066–1086.
- HC-Search: A learning framework for search-based structured prediction. Journal of Artificial Intelligence Research, 50: 369–407.
- Scalable global optimization via local Bayesian optimization. Advances in neural information processing systems, 32.
- Autofocused oracles for model-based design. CoRR, abs/2006.08052.
- Meta-learning with warped gradient descent. arXiv preprint arXiv:1909.00025.
- Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation. CoRR, abs/2102.07970.
- Benchmarking Batch Deep Reinforcement Learning Algorithms. arXiv preprint arXiv:1910.01708.
- A Minimalist Approach to Offline Reinforcement Learning. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y. N.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 20132–20145.
- Deep learning in protein structural modeling and design. Patterns, 1(9).
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Dy, J. G.; and Krause, A., eds., Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, 1856–1865. PMLR.
- beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations.
- Machine learning for electronic design automation: A survey. ACM Transactions on Design Automation of Electronic Systems, 26(5): 1–46.
- MOReL: Model-Based Offline Reinforcement Learning. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Auto-Encoding Variational Bayes.
- Offline Reinforcement Learning with Implicit Q-Learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Generative Pretraining for Black-Box Optimization. arXiv:2206.10786.
- Generative pretraining for black-box optimization. arXiv preprint arXiv:2206.10786.
- Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. In Wallach, H. M.; Larochelle, H.; Beygelzimer, A.; d’Alché-Buc, F.; Fox, E. B.; and Garnett, R., eds., Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, 11761–11771.
- Model inversion networks for model-based optimization. Advances in Neural Information Processing Systems, 33: 5126–5137.
- Conservative Q-Learning for Offline Reinforcement Learning. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Learning to optimize. arXiv preprint arXiv:1606.01885.
- Conditional Generative Adversarial Nets. CoRR, abs/1411.1784.
- Human-level control through deep reinforcement learning. nature, 518(7540): 529–533.
- Value prediction network. Advances in neural information processing systems, 30.
- Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning. CoRR, abs/1910.00177.
- Evaluating protein transfer learning with TAPE. Advances in neural information processing systems, 32.
- Human 5 UTR design and variant effect prediction from a massively parallel translation assay. Nature biotechnology, 37(7): 803–809.
- Rethinking drug design in the artificial intelligence era. Nature Reviews Drug Discovery, 19(5): 353–364.
- Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1): 148–175.
- Reinforcement learning: An introduction. MIT press.
- Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization.
- Conservative Objective Models for Effective Offline Model-Based Optimization. CoRR, abs/2107.06882.
- Scientific discovery in the age of artificial intelligence. Nature, 620(7972): 47–60.
- Behavior Regularized Offline Reinforcement Learning.
- Representation matters: offline pretraining for sequential decision making. In International Conference on Machine Learning, 11784–11794. PMLR.
- RoMA: Robust Model Adaptation for Offline Model-based Optimization. CoRR, abs/2110.14188.
- COMBO: Conservative Offline Model-Based Policy Optimization. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y. N.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 28954–28967.
- MOPO: Model-based Offline Policy Optimization. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.