Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diffusion Model for Data-Driven Black-Box Optimization (2403.13219v1)

Published 20 Mar 2024 in cs.LG and math.OC

Abstract: Generative AI has redefined artificial intelligence, enabling the creation of innovative content and customized solutions that drive business practices into a new era of efficiency and creativity. In this paper, we focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization over complex structured variables. Consider the practical scenario where one wants to optimize some structured design in a high-dimensional space, based on massive unlabeled data (representing design variables) and a small labeled dataset. We study two practical types of labels: 1) noisy measurements of a real-valued reward function and 2) human preference based on pairwise comparisons. The goal is to generate new designs that are near-optimal and preserve the designed latent structures. Our proposed method reformulates the design optimization problem into a conditional sampling problem, which allows us to leverage the power of diffusion models for modeling complex distributions. In particular, we propose a reward-directed conditional diffusion model, to be trained on the mixed data, for sampling a near-optimal solution conditioned on high predicted rewards. Theoretically, we establish sub-optimality error bounds for the generated designs. The sub-optimality gap nearly matches the optimal guarantee in off-policy bandits, demonstrating the efficiency of reward-directed diffusion models for black-box optimization. Moreover, when the data admits a low-dimensional latent subspace structure, our model efficiently generates high-fidelity designs that closely respect the latent structure. We provide empirical experiments validating our model in decision-making and content-creation tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (111)
  1. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011.
  2. Dynamic discrete choice structural models: A survey. Journal of Econometrics, 156(1):38–67, 2010.
  3. Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657, 2022.
  4. Is conditional generative modeling all you need for decision making? In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=sP1fo2K9DFG.
  5. Stochastic interpolants: A unifying framework for flows and diffusions. arXiv preprint arXiv:2303.08797, 2023.
  6. Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34:17981–17993, 2021.
  7. Identification and efficient semiparametric estimation of a dynamic discrete game. Technical report, National Bureau of Economic Research, 2015.
  8. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022.
  9. Universal guidance for diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 843–852, 2023.
  10. Survey of variation in human transcription factors reveals prevalent dna binding changes. Science, 351(6280):1450–1454, 2016.
  11. Linear convergence bounds for diffusion models via stochastic localization. arXiv preprint arXiv:2308.03686, 2023.
  12. Generative modeling with denoising auto-encoders and langevin sampling. arXiv preprint arXiv:2002.00107, 2020.
  13. Offline contextual bandits with overparameterized models. In International Conference on Machine Learning, pages 1049–1058. PMLR, 2021.
  14. Conditioning by adaptive sampling for robust design. In International conference on machine learning, pages 773–782. PMLR, 2019.
  15. Offline pricing and demand learning with censored data. Management Science, 69(2):885–903, 2023.
  16. Information-theoretic considerations in batch reinforcement learning. In International Conference on Machine Learning, pages 1042–1051. PMLR, 2019.
  17. Towards understanding hierarchical learning: Benefits of neural representations. Advances in Neural Information Processing Systems, 33:22134–22145, 2020.
  18. Nonparametric regression on low-dimensional manifolds using deep relu networks: Function approximation and statistical recovery. Information and Inference: A Journal of the IMA, 11(4):1203–1253, 2022a.
  19. Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data. arXiv preprint arXiv:2302.07194, 2023a.
  20. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. arXiv preprint arXiv:2209.11215, 2022b.
  21. The probability flow ode is provably fast. arXiv preprint arXiv:2305.11798, 2023b.
  22. Restoration-degradation beyond linear diffusions: A non-asymptotic analysis for ddim-type samplers. In International Conference on Machine Learning, pages 4462–4484. PMLR, 2023c.
  23. Locally robust semiparametric estimation. Econometrica, 90(4):1501–1535, 2022.
  24. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  25. Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hypothesis. arXiv preprint arXiv:2208.05314, 2022.
  26. Diffusion schrödinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34:17695–17709, 2021.
  27. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  28. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  29. Minimax-optimal off-policy evaluation with linear function approximation. In International Conference on Machine Learning, pages 2701–2709. PMLR, 2020.
  30. Scaling rectified flow transformers for high-resolution image synthesis. arXiv preprint arXiv:2403.03206, 2024.
  31. A theoretical analysis of deep q-learning. In Learning for Dynamics and Control, pages 486–489. PMLR, 2020.
  32. Spectral ranking inferences based on general multiway comparisons. arXiv preprint arXiv:2308.02918, 2023.
  33. Offline model-based optimization via normalized maximum likelihood estimation. arXiv preprint arXiv:2102.07970, 2021.
  34. Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375, 2022.
  35. On the intrinsic dimensionality of image representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3987–3996, 2019.
  36. Diffusion models as plug-and-play priors. arXiv preprint arXiv:2206.09012, 2022.
  37. Diffusion models in bioinformatics and computational biology. Nature Reviews Bioengineering, pages 1–19, 2023.
  38. A distribution-free theory of nonparametric regression, volume 1. Springer, 2002.
  39. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  40. Bayesian optimization using deep gaussian processes. arXiv preprint arXiv:1905.03350, 2019.
  41. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  42. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  43. The multichannel pricing dilemma: Do consumers accept higher offline than online prices? International Journal of Research in Marketing, 36(4):597–612, 2019.
  44. A tail inequality for quadratic forms of subgaussian random vectors, 2011.
  45. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
  46. Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
  47. Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning, 2022.
  48. Sample complexity of nonparametric off-policy evaluation on low-dimensional manifolds using deep networks. arXiv preprint arXiv:2206.02887, 2022.
  49. Is pessimism provably efficient for offline rl? In International Conference on Machine Learning, pages 5084–5096. PMLR, 2021.
  50. Morel: Model-based offline reinforcement learning. Advances in neural information processing systems, 33:21810–21823, 2020.
  51. Diffusion models for black-box optimization. In International Conference on Machine Learning, pages 17842–17857. PMLR, 2023.
  52. Learning multiple layers of features from tiny images. 2009.
  53. Generative artificial intelligence in marketing: Applications, opportunities, challenges, and research agenda, 2023.
  54. Model inversion networks for model-based optimization. Advances in Neural Information Processing Systems, 33:5126–5137, 2020.
  55. Convergence for score-based generative modeling with polynomial complexity. arXiv preprint arXiv:2206.06227, 2022a.
  56. Convergence of score-based generative modeling for general data distributions. arXiv preprint arXiv:2209.12381, 2022b.
  57. Convergence of score-based generative modeling for general data distributions. In International Conference on Algorithmic Learning Theory, pages 946–985. PMLR, 2023a.
  58. Proteinsgm: Score-based generative modeling for de novo protein design. bioRxiv, 2023b. doi: 10.1101/2022.07.13.499967. URL https://www.biorxiv.org/content/early/2023/02/04/2022.07.13.499967.
  59. Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training, 2021.
  60. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  61. Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343, 2022.
  62. Reinforcement learning with human feedback: Learning dynamic choices via pessimism. arXiv preprint arXiv:2305.18438, 2023.
  63. Adaptdiffuser: Diffusion models as adaptive self-evolving planners. arXiv preprint arXiv:2302.01877, 2023.
  64. Breaking the curse of horizon: Infinite-horizon off-policy estimation. Advances in Neural Information Processing Systems, 31, 2018.
  65. Sora: A review on background, technology, limitations, and opportunities of large vision models. arXiv preprint arXiv:2402.17177, 2024.
  66. Linear and nonlinear programming, volume 2. Springer, 2021.
  67. Deep networks as denoising algorithms: Sample-efficient learning of diffusion models in high-dimensional graphical models. arXiv preprint arXiv:2309.11420, 2023.
  68. Posterior sampling from the spiked models via diffusion processes. arXiv preprint arXiv:2304.11449, 2023.
  69. Finite-time bounds for fitted value iteration. Journal of Machine Learning Research, 9(5), 2008.
  70. Adaptive approximation and generalization of deep neural network with intrinsic dimensionality. The Journal of Machine Learning Research, 21(1):7018–7055, 2020.
  71. Offline neural contextual bandits: Pessimism, optimization and generalization. arXiv preprint arXiv:2111.13807, 2021.
  72. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  73. Diffusion models are minimax optimal distribution estimators. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023. URL https://openreview.net/forum?id=6961CeTSFA.
  74. The potential of generative artificial intelligence across disciplines: Perspectives and future directions. Journal of Computer Information Systems, pages 1–32, 2023.
  75. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  76. A novel recommendation model for online-to-offline service based on the customer network and service location. Journal of Management Information Systems, 37(2):563–593, 2020.
  77. User activity measurement in rating-based online-to-offline (o2o) service recommendation. Information Sciences, 479:180–196, 2019.
  78. Imitating human behaviour with diffusion models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Pv1GPQzRrC8.
  79. Reinforcement learning by reward-weighted regression for operational space control. In Proceedings of the 24th international conference on Machine learning, pages 745–750, 2007.
  80. The intrinsic dimension of images and its impact on learning. arXiv preprint arXiv:2104.08894, 2021.
  81. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  82. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  83. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  84. Face image quality assessment: A literature survey. ACM Computing Surveys, 54(10s):1–49, January 2022. ISSN 1557-7341. doi: 10.1145/3507901. URL http://dx.doi.org/10.1145/3507901.
  85. Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
  86. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
  87. Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25, 2012.
  88. Input warping for bayesian optimization of non-stationary functions. In International Conference on Machine Learning, pages 1674–1682. PMLR, 2014.
  89. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=St1giarCHLP.
  90. Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
  91. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence, pages 574–584. PMLR, 2020a.
  92. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
  93. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  94. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
  95. A global geometric framework for nonlinear dimensionality reduction. science, 290(5500):2319–2323, 2000.
  96. Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer Publishing Company, Incorporated, 1st edition, 2008. ISBN 0387790519.
  97. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
  98. Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
  99. Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
  100. Online-offline competitive pricing with reference price effect. Journal of the Operational Research Society, 72(3):642–653, 2021.
  101. Skill preferences: Learning to extract and execute robotic skills from human feedback. In Conference on Robot Learning, pages 1259–1268. PMLR, 2022.
  102. Stylediffusion: Controllable disentangled style transfer via diffusion models, 2023.
  103. De novo design of protein structure and function with rfdiffusion. Nature, 620(7976):1089–1100, 2023.
  104. Optimal score estimation via empirical bayes smoothing. arXiv preprint arXiv:2402.07747, 2024.
  105. Recursively summarizing books with human feedback. arXiv preprint arXiv:2109.10862, 2021.
  106. Bellman-consistent pessimism for offline reinforcement learning. Advances in neural information processing systems, 34:6683–6694, 2021.
  107. Reward-directed conditional diffusion: Provable distribution estimation and reward improvement. arXiv preprint arXiv:2307.07055, 2023.
  108. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  109. High-dimensional dueling optimization with preference embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 11280–11288, 2023.
  110. Principled reinforcement learning with human feedback from pairwise or k𝑘kitalic_k-wise comparisons. arXiv preprint arXiv:2301.11270, 2023.
  111. Diffiqa: Face image quality assessment using denoising diffusion probabilistic models, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zihao Li (161 papers)
  2. Hui Yuan (71 papers)
  3. Kaixuan Huang (70 papers)
  4. Chengzhuo Ni (9 papers)
  5. Yinyu Ye (104 papers)
  6. Minshuo Chen (44 papers)
  7. Mengdi Wang (199 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets