Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoFRIDA: Self-Supervised Fine-Tuning for Human-Robot Co-Painting (2402.13442v1)

Published 21 Feb 2024 in cs.RO

Abstract: Prior robot painting and drawing work, such as FRIDA, has focused on decreasing the sim-to-real gap and expanding input modalities for users, but the interaction with these systems generally exists only in the input stages. To support interactive, human-robot collaborative painting, we introduce the Collaborative FRIDA (CoFRIDA) robot painting framework, which can co-paint by modifying and engaging with content already painted by a human collaborator. To improve text-image alignment, FRIDA's major weakness, our system uses pre-trained text-to-image models; however, pre-trained models in the context of real-world co-painting do not perform well because they (1) do not understand the constraints and abilities of the robot and (2) cannot perform co-painting without making unrealistic edits to the canvas and overwriting content. We propose a self-supervised fine-tuning procedure that can tackle both issues, allowing the use of pre-trained state-of-the-art text-image alignment models with robots to enable co-painting in the physical world. Our open-source approach, CoFRIDA, creates paintings and drawings that match the input text prompt more clearly than FRIDA, both from a blank canvas and one with human created work. More generally, our fine-tuning procedure successfully encodes the robot's constraints and abilities into a foundation model, showcasing promising results as an effective method for reducing sim-to-real gaps.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. P. Schaldenbrand, J. McCann, and J. Oh, “Frida: A collaborative robot painter with a differentiable, real2sim2real planning environment,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 11 712–11 718.
  2. H. H. Jiang, L. Brown, J. Cheng, M. Khan, A. Gupta, D. Workman, A. Hanna, J. Flowers, and T. Gebru, “Ai art and its impact on artists,” in Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, ser. AIES ’23.   New York, NY, USA: Association for Computing Machinery, 2023, p. 363–374. [Online]. Available: https://doi.org/10.1145/3600211.3604681
  3. C. Bateman, “Creating for creatives: A humanistic approach to designing ai tools targeted at professional animators,” Ph.D. dissertation, Harvard University, 2021.
  4. N. Davis, C.-P. Hsiao, K. Yashraj Singh, L. Li, and B. Magerko, “Empirically studying participatory sense-making in abstract drawing with a co-creative cognitive agent,” in Proceedings of the 21st International Conference on Intelligent User Interfaces, 2016, pp. 196–207.
  5. F. Ibarrola, T. Lawton, and K. Grace, “A collaborative, interactive and context-aware drawing agent for co-creative design,” IEEE Transactions on Visualization and Computer Graphics, 2023.
  6. T. Lawton, F. J. Ibarrola, D. Ventura, and K. Grace, “Drawing with reframer: Emergence and control in co-creative ai,” in Proceedings of the 28th International Conference on Intelligent User Interfaces, 2023, pp. 264–277.
  7. T. Lawton, K. Grace, and F. J. Ibarrola, “When is a tool a tool? user perceptions of system agency in human–ai co-creative drawing,” in Proceedings of the 2023 ACM Designing Interactive Systems Conference, 2023, pp. 1978–1996.
  8. C. Jansen and E. Sklar, “Exploring co-creative drawing workflows,” Frontiers in Robotics and AI, vol. 8, p. 577770, 2021.
  9. S. Lee and W. Ju, “Adversarial robots as creative collaborators,” arXiv preprint arXiv:2402.03691, 2024.
  10. M. D. Cooney and M. L. R. Menezes, “Design for an art therapy robot: An explorative review of the theoretical foundations for engaging in emotional and creative painting with a robot,” Multimodal Technologies and Interaction, vol. 2, no. 3, p. 52, 2018.
  11. M. Cooney and P. Berck, “Designing a robot which paints with a human: visual metaphors to convey contingency and artistry,” in ICRA-X Robots Art Program at IEEE International Conference on Robotics and Automation (ICRA), Montreal QC, Canada, 2019, p. 2.
  12. M. Cooney, “Robot art, in the eye of the beholder?: Personalized metaphors facilitate communication of emotions and creativity,” Frontiers in Robotics and AI, vol. 8, p. 668986, 2021.
  13. S. Z. Shaik, V. Srinivasan, Y. Peng, M. Lee, and N. Davis, “Co-creative robotic arm for differently-abled kids: Speech, sketch inputs and external feedbacks for multiple drawings,” in Proceedings of the Future Technologies Conference (FTC) 2020, Volume 3.   Springer, 2021, pp. 998–1007.
  14. Y. Lin, J. Guo, Y. Chen, C. Yao, and F. Ying, “It is your turn: Collaborative ideation with a co-creative robot through sketch,” in Proceedings of the 2020 CHI conference on human factors in computing systems, 2020, pp. 1–14.
  15. D. Herath, J. McFarlane, E. A. Jochum, J. B. Grant, and P. Tresset, “Arts+ health: New approaches to arts and robots in health care,” in Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, 2020, pp. 1–7.
  16. P. Schaldenbrand and J. Oh, “Content masked loss: Human-like brush stroke planning in a reinforcement learning painting agent,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 1, 2021, pp. 505–512.
  17. T. Lindemeier, “e-david: Non-photorealistic rendering using a robot and visual feedback,” Ph.D. dissertation, University of Konstanz, 2018.
  18. S. Wang, J. Chen, X. Deng, S. Hutchinson, and F. Dellaert, “Robot calligraphy using pseudospectral optimal control in conjunction with a novel dynamic brush model,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 6696–6703.
  19. Rob Carter and Nick Carter, “Dark factory portraits,” http://www.robandnick.com/dark-factory-portraits, 2017.
  20. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” CoRR, vol. abs/2103.00020, 2021. [Online]. Available: https://arxiv.org/abs/2103.00020
  21. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2021.
  22. T. Brooks, A. Holynski, and A. A. Efros, “Instructpix2pix: Learning to follow image editing instructions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 392–18 402.
  23. D. Ha and D. Eck, “A neural representation of sketch drawings,” arXiv preprint arXiv:1704.03477, 2017.
  24. A. Bidgoli, M. L. De Guevara, C. Hsiung, J. Oh, and E. Kang, “Artistic style in robotic painting; a machine learning approach to learning brushstroke from human artists,” in 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).   IEEE, 2020, pp. 412–418.
  25. G. Lee, M. Kim, M. Lee, and B.-T. Zhang, “From scratch to sketch: Deep decoupled hierarchical reinforcement learning for robotic sketching agent,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 5553–5559.
  26. M. C. Sola and V. Guljajeva, “Dream painter: Exploring creative possibilities of ai-aided speech-to-image synthesis in the interactive art context,” Proceedings of the ACM on Computer Graphics and Interactive Techniques, vol. 5, no. 4, pp. 1–11, 2022.
  27. J. Jongejan, H. Rowley, T. Kawashima, J. Kim, and N. Fox-Gieg, “The quick, draw!-ai experiment,” Mount View, CA, accessed Feb, vol. 17, no. 2018, p. 4, 2016.
  28. D. Parikh and C. L. Zitnick, “Exploring crowd co-creation scenarios for sketches,” arXiv preprint arXiv:2005.07328, 2020.
  29. e. a. Juliet Shen, “Co-drawings,” 2016. [Online]. Available: https://www.codrawseattle.com/
  30. C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, et al., “Laion-5b: An open large-scale dataset for training next generation image-text models,” Advances in Neural Information Processing Systems, vol. 35, pp. 25 278–25 294, 2022.
  31. Y. Vinker, E. Pajouheshgar, J. Y. Bo, R. C. Bachmann, A. H. Bermano, D. Cohen-Or, A. Zamir, and A. Shamir, “Clipasso: Semantically-aware object sketching,” arXiv preprint arXiv:2202.05822, 2022.
  32. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
  33. J. Hessel, A. Holtzman, M. Forbes, R. L. Bras, and Y. Choi, “Clipscore: A reference-free evaluation metric for image captioning,” arXiv preprint arXiv:2104.08718, 2021.
  34. J. Li, D. Li, C. Xiong, and S. Hoi, “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in International Conference on Machine Learning.   PMLR, 2022, pp. 12 888–12 900.
  35. J. Yu, Y. Xu, J. Y. Koh, T. Luong, G. Baid, Z. Wang, V. Vasudevan, A. Ku, Y. Yang, B. K. Ayan, et al., “Scaling autoregressive models for content-rich text-to-image generation,” arXiv preprint arXiv:2206.10789, vol. 2, no. 3, p. 5, 2022.
  36. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13.   Springer, 2014, pp. 740–755.
  37. A. Birhane, V. U. Prabhu, and E. Kahembwe, “Multimodal datasets: misogyny, pornography, and malignant stereotypes,” arXiv preprint arXiv:2110.01963, 2021.
Citations (10)

Summary

We haven't generated a summary for this paper yet.