Leveraging Human Revisions for Improving Text-to-Layout Models (2405.13026v1)
Abstract: Learning from human feedback has shown success in aligning large, pretrained models with human values. Prior works have mostly focused on learning from high-level labels, such as preferences between pairs of model outputs. On the other hand, many domains could benefit from more involved, detailed feedback, such as revisions, explanations, and reasoning of human users. Our work proposes using nuanced feedback through the form of human revisions for stronger alignment. In this paper, we ask expert designers to fix layouts generated from a generative layout model that is pretrained on a large-scale dataset of mobile screens. Then, we train a reward model based on how human designers revise these generated layouts. With the learned reward model, we optimize our model with reinforcement learning from human feedback (RLHF). Our method, Revision-Aware Reward Models ($\method$), allows a generative text-to-layout model to produce more modern, designer-aligned layouts, showing the potential for utilizing human revisions and stronger forms of feedback in improving generative models.
- Language reward modulation for pretraining reinforcement learning, 2023.
- Variational transformer networks for layout generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13642–13652, 2021.
- Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022.
- Learning from physical human corrections, one feature at a time. In 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 141–149, 2018.
- Training diffusion models with reinforcement learning, 2023.
- Open problems and fundamental limitations of reinforcement learning from human feedback, 2023.
- Play: Parametrically conditioned layout generation using latent diffusion, 2023.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Rico: A mobile app dataset for building data-driven design applications. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, UIST ’17, pp. 845–854, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450349819. doi: 10.1145/3126594.3126651. URL https://doi.org/10.1145/3126594.3126651.
- Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models, 2023.
- Layouttransformer: Layout generation and completion with self-attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1004–1014, 2021.
- Few-shot preference learning for human-in-the-loop rl, 2022.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Layoutdm: Discrete diffusion model for controllable layout generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10167–10176, 2023.
- Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. arXiv, 2022.
- Constrained graphic layout generation via latent optimization. In Proceedings of the 29th ACM International Conference on Multimedia, pp. 88–96, 2021.
- Blt: bidirectional layout transformer for controllable layout generation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVII, pp. 474–490. Springer, 2022.
- Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. arXiv preprint arXiv:2106.05091, 2021.
- Aligning text-to-image models using human feedback, 2023.
- Learning to denoise raw mobile ui layouts for improving datasets at scale. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, CHI ’22, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450391573. doi: 10.1145/3491102.3502042. URL https://doi.org/10.1145/3491102.3502042.
- Learning human objectives from sequences of physical corrections, 2021.
- Chain of hindsight aligns language models with feedback, 2023.
- Physical interaction as communication: Learning robot objectives online from human corrections, 2021.
- R3m: A universal visual representation for robot manipulation, 2022.
- Training language models to follow instructions with human feedback, 2022.
- READ: recursive autoencoders for document layout generation. CoRR, abs/1909.00302, 2019. URL http://arxiv.org/abs/1909.00302.
- Direct preference optimization: Your language model is secretly a reward model, 2023.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022a.
- High-resolution image synthesis with latent diffusion models, 2022b.
- A reduction of imitation learning and structured prediction to no-regret online learning, 2011.
- Proximal policy optimization algorithms, 2017.
- Learning to summarize from human feedback, 2022.
- Towards better semantic understanding of mobile interfaces. CoRR, abs/2210.02663, 2022. URL https://arxiv.org/abs/2210.02663.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Imagereward: Learning and evaluating human preferences for text-to-image generation, 2023.
- Publaynet: largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE, Sep. 2019. doi: 10.1109/ICDAR.2019.00166.