Pre-Training and Fine-Tuning Generative Flow Networks (2310.03419v1)
Abstract: Generative Flow Networks (GFlowNets) are amortized samplers that learn stochastic policies to sequentially generate compositional objects from a given unnormalized reward distribution. They can generate diverse sets of high-reward objects, which is an important consideration in scientific discovery tasks. However, as they are typically trained from a given extrinsic reward function, it remains an important open challenge about how to leverage the power of pre-training and train GFlowNets in an unsupervised fashion for efficient adaptation to downstream tasks. Inspired by recent successes of unsupervised pre-training in various domains, we introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet (OC-GFN) that learns to explore the candidate space. Specifically, OC-GFN learns to reach any targeted outcomes, akin to goal-conditioned policies in reinforcement learning. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks. Nonetheless, adapting OC-GFN on a downstream task-specific reward involves an intractable marginalization over possible outcomes. We propose a novel way to approximate this marginalization by learning an amortized predictor enabling efficient fine-tuning. Extensive experimental results validate the efficacy of our approach, demonstrating the effectiveness of pre-training the OC-GFN, and its ability to swiftly adapt to downstream tasks and discover modes more efficiently. This work may serve as a foundation for further exploration of pre-training strategies in the context of GFlowNets.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- Survey of variation in human transcription factors reveals prevalent dna binding changes. Science, 351(6280):1450–1454, 2016.
- Flow network based generative models for non-iterative diverse candidate generation. Neural Information Processing Systems (NeurIPS), 2021.
- Gflownet foundations. Journal of Machine Learning Research, 24(210):1–55, 2023. URL http://jmlr.org/papers/v24/22-0364.html.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.
- Actionable models: Unsupervised offline reinforcement learning of robotic skills. arXiv preprint arXiv:2104.07749, 2021.
- Learning discrete energy-based models via auxiliary-variable local exploration. Advances in Neural Information Processing Systems, 33:10443–10455, 2020.
- Bayesian structure learning with generative flow networks. Uncertainty in Artificial Intelligence (UAI), 2022.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Diversity is all you need: Learning skills without a reward function. International Conference on Learning Representations (ICLR), 2018.
- C-learning: Learning to achieve goals via recursive classification. arXiv preprint arXiv:2011.08909, 2020.
- Planning to practice: Efficient online fine-tuning by composing goals in latent space. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4076–4083. IEEE, 2022.
- Fast task inference with variational intrinsic successor features. arXiv preprint arXiv:1906.05030, 2019.
- Olivier Henaff. Data-efficient image recognition with contrastive predictive coding. In International conference on machine learning, pages 4182–4192. PMLR, 2020.
- Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 328–339, 2018.
- Gflownet-em for learning compositional latent variable models. In International Conference on Machine Learning, pages 13528–13549, 2023.
- Reinforcement learning with unsupervised auxiliary tasks. In International Conference on Learning Representations, 2016.
- Biological sequence design with GFlowNets. International Conference on Machine Learning (ICML), 2022.
- Gflownets for ai-driven scientific discovery. Digital Discovery, 2023a.
- Multi-objective gflownets. In International Conference on Machine Learning, pages 14631–14653, 2023b.
- Leslie Pack Kaelbling. Learning to achieve goals. In IJCAI, volume 2, pages 1094–8. Citeseer, 1993.
- Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR), 2015.
- One solution is not all you need: Few-shot extrapolation via structured maxent rl. Advances in Neural Information Processing Systems, 33:8198–8210, 2020.
- A theory of continuous generative flow networks. In International Conference on Machine Learning, pages 18269–18300. PMLR, 2023.
- Aps: Active pretraining with successor features. In International Conference on Machine Learning, pages 6736–6747. PMLR, 2021.
- ViennaRNA package 2.0. Algorithms for molecular biology, 6(1):26, 2011.
- Learning GFlowNets from partial episodes for improved convergence and stability. arXiv preprint 2209.12782, 2022.
- Trajectory balance: Improved credit assignment in GFlowNets. Neural Information Processing Systems (NeurIPS), 2022.
- Gflownets and variational inference. International Conference on Learning Representations (ICLR), 2023.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Visual reinforcement learning with imagined goals. Advances in neural information processing systems, 31, 2018.
- Better training of gflownets with local credit and incomplete trajectories. arXiv preprint arXiv:2302.01687, 2023a.
- Generative augmented flow networks. International Conference on Learning Representations (ICLR), 2023b.
- Stochastic generative flow networks. arXiv preprint arXiv:2302.09465, 2023c.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Universal value function approximators. In International conference on machine learning, pages 1312–1320. PMLR, 2015.
- Planning to explore via self-supervised world models. In International Conference on Machine Learning, pages 8583–8592. PMLR, 2020.
- Adalead: A simple and robust adaptive greedy search algorithm for sequence design. arXiv preprint, 2020.
- A-nesi: A scalable approximate method for probabilistic neurosymbolic inference. arXiv preprint arXiv:2212.12393, 2022.
- Many-goals reinforcement learning. arXiv preprint arXiv:1806.09605, 2018.
- Robust scheduling with gflownets. arXiv preprint arXiv:2302.05446, 2023a.
- Unifying likelihood-free inference with black-box optimization and beyond. arXiv preprint arXiv:2110.03372, 2021.
- Let the flows tell: Solving graph combinatorial optimization problems with gflownets. arXiv preprint arXiv:2305.17010, 2023b.
- Distributional gflownets with quantile flows. arXiv preprint arXiv:2302.05793, 2023c.
- Mutual information state intrinsic control. arXiv preprint arXiv:2103.08107, 2021.
- A variational perspective on generative flow networks. arXiv preprint 2210.07992, 2022.