CoFRIDA: Self-Supervised Fine-Tuning for Human-Robot Co-Painting (2402.13442v1)
Abstract: Prior robot painting and drawing work, such as FRIDA, has focused on decreasing the sim-to-real gap and expanding input modalities for users, but the interaction with these systems generally exists only in the input stages. To support interactive, human-robot collaborative painting, we introduce the Collaborative FRIDA (CoFRIDA) robot painting framework, which can co-paint by modifying and engaging with content already painted by a human collaborator. To improve text-image alignment, FRIDA's major weakness, our system uses pre-trained text-to-image models; however, pre-trained models in the context of real-world co-painting do not perform well because they (1) do not understand the constraints and abilities of the robot and (2) cannot perform co-painting without making unrealistic edits to the canvas and overwriting content. We propose a self-supervised fine-tuning procedure that can tackle both issues, allowing the use of pre-trained state-of-the-art text-image alignment models with robots to enable co-painting in the physical world. Our open-source approach, CoFRIDA, creates paintings and drawings that match the input text prompt more clearly than FRIDA, both from a blank canvas and one with human created work. More generally, our fine-tuning procedure successfully encodes the robot's constraints and abilities into a foundation model, showcasing promising results as an effective method for reducing sim-to-real gaps.
- P. Schaldenbrand, J. McCann, and J. Oh, “Frida: A collaborative robot painter with a differentiable, real2sim2real planning environment,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 11 712–11 718.
- H. H. Jiang, L. Brown, J. Cheng, M. Khan, A. Gupta, D. Workman, A. Hanna, J. Flowers, and T. Gebru, “Ai art and its impact on artists,” in Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, ser. AIES ’23. New York, NY, USA: Association for Computing Machinery, 2023, p. 363–374. [Online]. Available: https://doi.org/10.1145/3600211.3604681
- C. Bateman, “Creating for creatives: A humanistic approach to designing ai tools targeted at professional animators,” Ph.D. dissertation, Harvard University, 2021.
- N. Davis, C.-P. Hsiao, K. Yashraj Singh, L. Li, and B. Magerko, “Empirically studying participatory sense-making in abstract drawing with a co-creative cognitive agent,” in Proceedings of the 21st International Conference on Intelligent User Interfaces, 2016, pp. 196–207.
- F. Ibarrola, T. Lawton, and K. Grace, “A collaborative, interactive and context-aware drawing agent for co-creative design,” IEEE Transactions on Visualization and Computer Graphics, 2023.
- T. Lawton, F. J. Ibarrola, D. Ventura, and K. Grace, “Drawing with reframer: Emergence and control in co-creative ai,” in Proceedings of the 28th International Conference on Intelligent User Interfaces, 2023, pp. 264–277.
- T. Lawton, K. Grace, and F. J. Ibarrola, “When is a tool a tool? user perceptions of system agency in human–ai co-creative drawing,” in Proceedings of the 2023 ACM Designing Interactive Systems Conference, 2023, pp. 1978–1996.
- C. Jansen and E. Sklar, “Exploring co-creative drawing workflows,” Frontiers in Robotics and AI, vol. 8, p. 577770, 2021.
- S. Lee and W. Ju, “Adversarial robots as creative collaborators,” arXiv preprint arXiv:2402.03691, 2024.
- M. D. Cooney and M. L. R. Menezes, “Design for an art therapy robot: An explorative review of the theoretical foundations for engaging in emotional and creative painting with a robot,” Multimodal Technologies and Interaction, vol. 2, no. 3, p. 52, 2018.
- M. Cooney and P. Berck, “Designing a robot which paints with a human: visual metaphors to convey contingency and artistry,” in ICRA-X Robots Art Program at IEEE International Conference on Robotics and Automation (ICRA), Montreal QC, Canada, 2019, p. 2.
- M. Cooney, “Robot art, in the eye of the beholder?: Personalized metaphors facilitate communication of emotions and creativity,” Frontiers in Robotics and AI, vol. 8, p. 668986, 2021.
- S. Z. Shaik, V. Srinivasan, Y. Peng, M. Lee, and N. Davis, “Co-creative robotic arm for differently-abled kids: Speech, sketch inputs and external feedbacks for multiple drawings,” in Proceedings of the Future Technologies Conference (FTC) 2020, Volume 3. Springer, 2021, pp. 998–1007.
- Y. Lin, J. Guo, Y. Chen, C. Yao, and F. Ying, “It is your turn: Collaborative ideation with a co-creative robot through sketch,” in Proceedings of the 2020 CHI conference on human factors in computing systems, 2020, pp. 1–14.
- D. Herath, J. McFarlane, E. A. Jochum, J. B. Grant, and P. Tresset, “Arts+ health: New approaches to arts and robots in health care,” in Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, 2020, pp. 1–7.
- P. Schaldenbrand and J. Oh, “Content masked loss: Human-like brush stroke planning in a reinforcement learning painting agent,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 1, 2021, pp. 505–512.
- T. Lindemeier, “e-david: Non-photorealistic rendering using a robot and visual feedback,” Ph.D. dissertation, University of Konstanz, 2018.
- S. Wang, J. Chen, X. Deng, S. Hutchinson, and F. Dellaert, “Robot calligraphy using pseudospectral optimal control in conjunction with a novel dynamic brush model,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 6696–6703.
- Rob Carter and Nick Carter, “Dark factory portraits,” http://www.robandnick.com/dark-factory-portraits, 2017.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” CoRR, vol. abs/2103.00020, 2021. [Online]. Available: https://arxiv.org/abs/2103.00020
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2021.
- T. Brooks, A. Holynski, and A. A. Efros, “Instructpix2pix: Learning to follow image editing instructions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 392–18 402.
- D. Ha and D. Eck, “A neural representation of sketch drawings,” arXiv preprint arXiv:1704.03477, 2017.
- A. Bidgoli, M. L. De Guevara, C. Hsiung, J. Oh, and E. Kang, “Artistic style in robotic painting; a machine learning approach to learning brushstroke from human artists,” in 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE, 2020, pp. 412–418.
- G. Lee, M. Kim, M. Lee, and B.-T. Zhang, “From scratch to sketch: Deep decoupled hierarchical reinforcement learning for robotic sketching agent,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 5553–5559.
- M. C. Sola and V. Guljajeva, “Dream painter: Exploring creative possibilities of ai-aided speech-to-image synthesis in the interactive art context,” Proceedings of the ACM on Computer Graphics and Interactive Techniques, vol. 5, no. 4, pp. 1–11, 2022.
- J. Jongejan, H. Rowley, T. Kawashima, J. Kim, and N. Fox-Gieg, “The quick, draw!-ai experiment,” Mount View, CA, accessed Feb, vol. 17, no. 2018, p. 4, 2016.
- D. Parikh and C. L. Zitnick, “Exploring crowd co-creation scenarios for sketches,” arXiv preprint arXiv:2005.07328, 2020.
- e. a. Juliet Shen, “Co-drawings,” 2016. [Online]. Available: https://www.codrawseattle.com/
- C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, et al., “Laion-5b: An open large-scale dataset for training next generation image-text models,” Advances in Neural Information Processing Systems, vol. 35, pp. 25 278–25 294, 2022.
- Y. Vinker, E. Pajouheshgar, J. Y. Bo, R. C. Bachmann, A. H. Bermano, D. Cohen-Or, A. Zamir, and A. Shamir, “Clipasso: Semantically-aware object sketching,” arXiv preprint arXiv:2202.05822, 2022.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
- J. Hessel, A. Holtzman, M. Forbes, R. L. Bras, and Y. Choi, “Clipscore: A reference-free evaluation metric for image captioning,” arXiv preprint arXiv:2104.08718, 2021.
- J. Li, D. Li, C. Xiong, and S. Hoi, “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in International Conference on Machine Learning. PMLR, 2022, pp. 12 888–12 900.
- J. Yu, Y. Xu, J. Y. Koh, T. Luong, G. Baid, Z. Wang, V. Vasudevan, A. Ku, Y. Yang, B. K. Ayan, et al., “Scaling autoregressive models for content-rich text-to-image generation,” arXiv preprint arXiv:2206.10789, vol. 2, no. 3, p. 5, 2022.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
- A. Birhane, V. U. Prabhu, and E. Kahembwe, “Multimodal datasets: misogyny, pornography, and malignant stereotypes,” arXiv preprint arXiv:2110.01963, 2021.