Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Maximum entropy GFlowNets with soft Q-learning (2312.14331v2)

Published 21 Dec 2023 in cs.LG

Abstract: Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a scalable alternative to Markov Chain Monte Carlo (MCMC) methods. While GFNs draw inspiration from maximum entropy reinforcement learning (RL), the connection between the two has largely been unclear and seemingly applicable only in specific cases. This paper addresses the connection by constructing an appropriate reward function, thereby establishing an exact relationship between GFNs and maximum entropy RL. This construction allows us to introduce maximum entropy GFNs, which, in contrast to GFNs with uniform backward policy, achieve the maximum entropy attainable by GFNs without constraints on the state space.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Bellman, R. (1954). The theory of dynamic programming. Bulletin of the American Mathematical Society, 60(6):503–515.
  2. Flow network based generative models for non-iterative diverse candidate generation. Advances in Neural Information Processing Systems, 34:27381–27394.
  3. Gflownet foundations. Journal of Machine Learning Research, 24(210):1–55.
  4. Approximate inference in discrete distributions with monte carlo tree search and value functions. In Chiappa, S. and Calandra, R., editors, The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], volume 108 of Proceedings of Machine Learning Research, pages 624–634. PMLR.
  5. Bayesian structure learning with generative flow networks. In Uncertainty in Artificial Intelligence, pages 518–528. PMLR.
  6. Twice regularized mdps and the equivalence between robustness and regularization. Advances in Neural Information Processing Systems, 34:22274–22287.
  7. Optimal control and nonlinear filtering for nondegenerate diffusion processes. Technical Report ADA117069, Brown University, Lefschetz Center for Dynamical Systems, Providence, RI. Approved for public release.
  8. A link based network route choice model with unrestricted choice set. Transportation Research Part B: Methodological, 56:70–80.
  9. Taming the noise in reinforcement learning via soft updates. In Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI’16, page 202–211. AUAI Press.
  10. Extreme q-learning: Maxent rl without entropy. In The Eleventh International Conference on Learning Representations.
  11. A theory of regularized markov decision processes. In International Conference on Machine Learning, pages 2160–2169. PMLR.
  12. Reinforcement learning with deep energy-based policies. In International conference on machine learning, pages 1352–1361. PMLR.
  13. Fundamentals of convex analysis. Springer Science & Business Media.
  14. Smoothing Techniques for Computing Nash Equilibria of Sequential Games. Mathematics of OR, 35(2):494–512.
  15. Huber, P. J. (1992). Robust estimation of a location parameter. In Breakthroughs in statistics: Methodology and distribution, pages 492–518. Springer.
  16. Multi-objective gflownets. In International Conference on Machine Learning, pages 14631–14653. PMLR.
  17. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pages 2323–2332. PMLR.
  18. Expected flow networks in stochastic environments and two-player zero-sum games. arXiv preprint arXiv:2310.02779.
  19. Local search gflownets. arXiv preprint arXiv:2310.02710.
  20. Rdkit.
  21. Learning gflownets from partial episodes for improved convergence and stability. In International Conference on Machine Learning, pages 23467–23483. PMLR.
  22. Undiscounted recursive path choice models: Convergence properties and algorithms. Transportation Science, 56(6):1469–1482.
  23. Trajectory balance: Improved credit assignment in gflownets. In Advances in Neural Information Processing Systems.
  24. Differentiable dynamic programming for structured prediction and attention. In International Conference on Machine Learning, pages 3462–3471. PMLR.
  25. Bridging the gap between value and policy based reinforcement learning. Advances in neural information processing systems, 30.
  26. Stochastic generative flow networks. arXiv preprint arXiv:2302.09465.
  27. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  28. Relative entropy policy search. Proceedings of the AAAI Conference on Artificial Intelligence, 24(1):1607–1612.
  29. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1–7.
  30. On stochastic optimal control and reinforcement learning by approximate inference. Proceedings of Robotics: Science and Systems VIII.
  31. Rust, J. (1987). Optimal replacement of gmc bus engines: An empirical model of harold zurcher. Econometrica: Journal of the Econometric Society, pages 999–1033.
  32. Towards understanding and improving gflownet training. In Proceedings of the 40th International Conference on Machine Learning, ICML’23.
  33. Shwartz, A. (2001). Death and discounting. IEEE Transactions on Automatic Control, 46(4):644–647.
  34. Reinforcement learning: An introduction. MIT press.
  35. Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages 761–768.
  36. Todorov, E. (2006). Linearly-solvable markov decision problems. Advances in neural information processing systems, 19.
  37. Learning of non-parametric control policies with high-dimensional state features. In Artificial Intelligence and Statistics, pages 995–1003. PMLR.
  38. Munchausen reinforcement learning. Advances in Neural Information Processing Systems, 33:4235–4246.
  39. White, A. (2015). Developing a predictive approach to knowledge.
  40. Generative flow networks for discrete probabilistic modeling. In International Conference on Machine Learning, pages 26412–26428. PMLR.
  41. Maximum entropy inverse reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, volume 8, pages 1433–1438.
Citations (14)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com