Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Thompson sampling for improved exploration in GFlowNets (2306.17693v1)

Published 30 Jun 2023 in cs.LG

Abstract: Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory, pp.  39–1. JMLR Workshop and Conference Proceedings, 2012.
  2. Near-optimal regret bounds for thompson sampling. Journal of the ACM (JACM), 64(5):1–24, 2017.
  3. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds. Advances in Neural Information Processing Systems, 30, 2017.
  4. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp.  2623–2631, 2019.
  5. Never give up: Learning directed exploration strategies. arXiv preprint arXiv:2002.06038, 2020.
  6. Flow network based generative models for non-iterative diverse candidate generation. Neural Information Processing Systems (NeurIPS), 2021.
  7. GFlowNet foundations. arXiv preprint 2111.09266, 2023.
  8. Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.
  9. Better exploration with optimistic actor critic. Advances in Neural Information Processing Systems, 32, 2019.
  10. Bayesian structure learning with generative flow networks. Uncertainty in Artificial Intelligence (UAI), 2022.
  11. Diversity is all you need: Learning skills without a reward function. International Conference on Learning Representations (ICLR), 2018.
  12. Reinforcement learning with deep energy-based policies. International Conference on Machine Learning (ICML), 2017.
  13. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. International Conference on Machine Learning (ICML), 2018.
  14. Provably efficient maximum entropy exploration. International Conference on Machine Learning (ICML), 2019.
  15. GFlowNet-EM for learning compositional latent variable models. International Conference on Machine Learning (ICML), 2023.
  16. Marginalized state distribution entropy regularization in policy optimization. arXiv preprint 1912.05128, 2019.
  17. Biological sequence design with GFlowNets. International Conference on Machine Learning (ICML), 2022a.
  18. Multi-objective GFlowNets. arXiv preprint 2210.12765, 2022b.
  19. Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning. In International Conference on Machine Learning, pp. 6131–6141. PMLR, 2021.
  20. Learning GFlowNets from partial episodes for improved convergence and stability. International Conference on Machine Learning (ICML), 2023.
  21. Trajectory balance: Improved credit assignment in GFlowNets. Neural Information Processing Systems (NeurIPS), 2022.
  22. GFlowNets and variational inference. International Conference on Learning Representations (ICLR), 2023.
  23. Asynchronous methods for deep reinforcement learning. Neural Information Processing Systems (NIPS), 2016.
  24. Ray: A distributed framework for emerging {{\{{AI}}\}} applications. In 13th {normal-{\{{USENIX}normal-}\}} Symposium on Operating Systems Design and Implementation ({normal-{\{{OSDI}normal-}\}} 18), pp.  561–577, 2018.
  25. Bridging the gap between value and policy based reinforcement learning. Neural Information Processing Systems (NIPS), 2017.
  26. Information-directed exploration for deep reinforcement learning. arXiv preprint arXiv:1812.07544, 2018.
  27. (more) efficient reinforcement learning via posterior sampling. Advances in Neural Information Processing Systems, 26, 2013.
  28. Deep exploration via bootstrapped dqn. Advances in neural information processing systems, 29, 2016a.
  29. Generalization and exploration via randomized value functions. In International Conference on Machine Learning, pp. 2377–2386. PMLR, 2016b.
  30. Randomized prior functions for deep reinforcement learning. Advances in Neural Information Processing Systems, 31, 2018.
  31. Deep exploration via randomized value functions. J. Mach. Learn. Res., 20(124):1–62, 2019.
  32. The uncertainty bellman equation and exploration. In International Conference on Machine Learning, pp. 3836–3845, 2018.
  33. Generative augmented flow networks. arXiv preprint 2210.03308, 2022.
  34. Better training of gflownets with local credit and incomplete trajectories. arXiv preprint arXiv:2302.01687, 2023.
  35. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pp. 2778–2787. PMLR, 2017.
  36. A tutorial on thompson sampling. Foundations and Trends® in Machine Learning, 11(1):1–96, 2018.
  37. Proximal policy optimization algorithms. arXiv preprint 1707.06347, 2017.
  38. Thompson, W. R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
  39. Attention is all you need. Neural Information Processing Systems (NIPS), 2017.
  40. Exploration by maximizing rényi entropy for reward-free rl framework. Association for the Advancement of Artificial Intelligence (AAAI), 2021.
  41. Generative flow networks for discrete probabilistic modeling. International Conference on Machine Learning (ICML), 2022.
  42. Robust scheduling with gflownets. International Conference on Learning Representations (ICLR), 2023.
  43. Ziebart, B. D. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University, 2010.
Citations (23)

Summary

We haven't generated a summary for this paper yet.