Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model (2310.02054v2)

Published 3 Oct 2023 in cs.AI

Abstract: Aligning agent behaviors with diverse human preferences remains a challenging problem in reinforcement learning (RL), owing to the inherent abstractness and mutability of human preferences. To address these issues, we propose AlignDiff, a novel framework that leverages RL from Human Feedback (RLHF) to quantify human preferences, covering abstractness, and utilizes them to guide diffusion planning for zero-shot behavior customizing, covering mutability. AlignDiff can accurately match user-customized behaviors and efficiently switch from one to another. To build the framework, we first establish the multi-perspective human feedback datasets, which contain comparisons for the attributes of diverse behaviors, and then train an attribute strength model to predict quantified relative strengths. After relabeling behavioral datasets with relative strengths, we proceed to train an attribute-conditioned diffusion model, which serves as a planner with the attribute strength model as a director for preference aligning at the inference phase. We evaluate AlignDiff on various locomotion tasks and demonstrate its superior performance on preference matching, switching, and covering compared to other baselines. Its capability of completing unseen downstream tasks under human instructions also showcases the promising potential for human-AI collaboration. More visualization videos are released on https://aligndiff.github.io/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Is conditional generative modeling all you need for decision making? In The Eleventh International Conference on Learning Representations, ICLR, 2023.
  2. Preference-based policy learning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD, 2011.
  3. April: Active preference learning-based reinforcement learning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part II 23, pp.  116–131, 2012.
  4. Hindsight experience replay. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS, 2017.
  5. Accurately and efficiently interpreting human-robot instructions of varying granularities. In Proceedings of Robotics: Science and Systems, 2017.
  6. Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences. The International Journal of Robotics Research, pp.  45–67, 2022.
  7. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, pp.  324–345, 1952.
  8. Scaling data-driven robotics with reward sketching and batch reinforcement learning. arXiv preprint arXiv:1909.12200, 2019.
  9. Offline reinforcement learning via high-fidelity generative behavior modeling. In The Eleventh International Conference on Learning Representations, ICLR, 2023.
  10. Decision transformer: Reinforcement learning via sequence modeling. arXiv preprint arXiv:2106.01345, 2021.
  11. Deep reinforcement learning from human preferences. Advances in neural information processing systems, NIPS, 30, 2017.
  12. Rvs: What is essential for offline RL via supervised learning? In International Conference on Learning Representations, ICLR, 2022.
  13. A minimalist approach to offline reinforcement learning. In Thirty-Fifth Conference on Neural Information Processing Systems, NIPS, 2021.
  14. Generating images from caption and vice versa via CLIP-guided generative latent space search. In Proceedings of the International Conference on Image Processing and Vision Engineering, 2021.
  15. Hierarchical skills for efficient exploration. In Advances in Neural Information Processing Systems, NIPS, 2021.
  16. Using natural language for reward shaping in reinforcement learning. Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI, 2019.
  17. Leveraging approximate symbolic models for reinforcement learning via skill diversity. arXiv preprint arXiv:2202.02886, 2022.
  18. Relative behavioral attributes: Filling the gap between symbolic goal specification and reward learning from human preferences. In The Eleventh International Conference on Learning Representations, ICLR, 2023.
  19. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018.
  20. Inverse reward design. Advances in neural information processing systems, NIPS, 2017.
  21. Aligning artificial intelligence with human values: reflections from a phenomenological perspective. AI and SOCIETY, 37, 2021.
  22. Generating behaviorally diverse policies with latent diffusion models. arXiv preprint arXiv:2305.18738, 2023.
  23. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
  24. Denoising diffusion probabilistic models. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS, 2020.
  25. Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning, ICML, 2022.
  26. Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. In International Conference on Machine Learning, ICML, 2021.
  27. Adaptdiffuser: Diffusion models as adaptive self-evolving planners. In International Conference on Machine Learning, ICML, 2023.
  28. Audioldm: Text-to-audio generation with latent diffusion models. International Conference on Machine Learning, ICML, 2023.
  29. VIP: Towards universal visual reward and representation via value-implicit pre-training. In NeurIPS 2022 Foundation Models for Decision Making Workshop, 2022.
  30. Mapping instructions and visual observations to actions with reinforcement learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017.
  31. Metadiffuser: Diffusion model as conditional planner for offline meta-rl, 2023.
  32. What matters for adversarial imitation learning? In Advances in Neural Information Processing Systems, NIPS, 2021.
  33. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, pp.  27730–27744, 2022.
  34. SURF: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning. In International Conference on Learning Representations, ICLR, 2022.
  35. Zero-shot visual imitation. In International Conference on Learning Representations, ICLR, 2018.
  36. Imitating human behaviour with diffusion models. In The Eleventh International Conference on Learning Representations, ICLR, 2023.
  37. Scalable diffusion models with transformers. arXiv preprint arXiv:2212.09748, 2023.
  38. MolDiff: Addressing the atom-bond inconsistency problem in 3D molecule diffusion generation. In Proceedings of the 40th International Conference on Machine Learning, ICML, pp.  27611–27629, 2023.
  39. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning. In 2011 IEEE international conference on rehabilitation robotics, pp.  1–7, 2011.
  40. Zero-shot text-to-image generation. In Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, pp.  8821–8831. PMLR, 2021.
  41. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International Conference on Machine Learning, ICML, 2021.
  42. Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems, NIPS, 2019.
  43. DiffusER: Diffusion via edit-based reconstruction. In The Eleventh International Conference on Learning Representations, ICLR, 2023.
  44. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2019.
  45. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  10684–10695, 2022.
  46. U-net: Convolutional networks for biomedical image segmentation. arXiv preprint arXiv:1505.04597, 2015.
  47. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp.  627–635. JMLR Workshop and Conference Proceedings, 2011.
  48. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, NIPS, 2022.
  49. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  50. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, ICML, pp.  2256–2265, 2015.
  51. Denoising diffusion implicit models. In International Conference on Learning Representations, ICLR, 2021.
  52. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, NIPS, 33:3008–3021, 2020.
  53. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.  5026–5033, 2012.
  54. dm control: Software and tasks for continuous control. Software Impacts, pp.  100022, 2020a.
  55. dm control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020b.
  56. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2018.
  57. Diffusion policies as an expressive policy class for offline reinforcement learning. In The Eleventh International Conference on Learning Representations, ICLR, 2023.
  58. A bayesian approach for policy learning from trajectory preference queries. Advances in neural information processing systems, NIPS, 2012.
  59. Preference-based reinforcement learning: A preliminary survey. In Proceedings of the ECML/PKDD-13 Workshop on Reinforcement Learning from Generalized Feedback: Beyond Numeric Rewards, 2013.
  60. Scaling pareto-efficient decision making via offline multi-objective rl. In International Conference on Learning Representations, ICLR, 2023.
Citations (19)

Summary

We haven't generated a summary for this paper yet.