BlendScape: Enabling End-User Customization of Video-Conferencing Environments through Generative AI (2403.13947v2)
Abstract: Today's video-conferencing tools support a rich range of professional and social activities, but their generic meeting environments cannot be dynamically adapted to align with distributed collaborators' needs. To enable end-user customization, we developed BlendScape, a rendering and composition system for video-conferencing participants to tailor environments to their meeting context by leveraging AI image generation techniques. BlendScape supports flexible representations of task spaces by blending users' physical or digital backgrounds into unified environments and implements multimodal interaction techniques to steer the generation. Through an exploratory study with 15 end-users, we investigated whether and how they would find value in using generative AI to customize video-conferencing environments. Participants envisioned using a system like BlendScape to facilitate collaborative activities in the future, but required further controls to mitigate distracting or unrealistic visual elements. We implemented scenarios to demonstrate BlendScape's expressiveness for supporting environment design strategies from prior work and propose composition techniques to improve the quality of environments.
- Interactive digital photomontage. ACM Transactions on Graphics 23, 3 (Aug. 2004), 294–302. https://doi.org/10.1145/1015706.1015718
- Shai Avidan and Ariel Shamir. 2007. Seam Carving for Content-Aware Image Resizing. In ACM SIGGRAPH 2007 Papers (SIGGRAPH ’07). Association for Computing Machinery, New York, NY, USA, 10–es. https://doi.org/10.1145/1275808.1276390
- Remote Learners, Home Makers: How Digital Fabrication Was Taught Online During a Pandemic. In CHI ’21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama, Japan, May 8-13, 2021, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.). ACM, 350:1–350:14. https://doi.org/10.1145/3411764.3445450
- Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco , CA , USA. https://doi.org/10.48550/arXiv.2304.09337 arXiv:2304.09337 [cs]
- Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 2 (2006), 77–101.
- Language Models Are Few-Shot Learners. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 1877–1901.
- Bill Buxton. 2009. Mediaspace – Meaningspace – Meetingspace. Springer London, London, 217–231. https://doi.org/10.1007/978-1-84882-483-6_13
- Breakdowns and Breakthroughs: Observing Musicians’ Responses to the COVID-19 Pandemic. In CHI ’21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama, Japan, May 8-13, 2021, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.). ACM, 571:1–571:13. https://doi.org/10.1145/3411764.3445192
- MeetScript: Designing Transcript-based Interactions to Support Active Participation in Group Video Meetings. Proceedings of the ACM on Human-Computer Interaction abs/2309.12115 (2023). https://doi.org/10.48550/ARXIV.2309.12115 arXiv:2309.12115
- Jaz Hee-jeong Choi and Cade Diehm. 2021. Aesthetic flattening. Interactions 28, 4 (2021), 21–23. https://doi.org/10.1145/3468080
- John Joon Young Chung and Eytan Adar. 2023a. Artinter: AI-powered Boundary Objects for Commissioning Visual Arts. In Proceedings of the 2023 ACM Designing Interactive Systems Conference (DIS ’23). Association for Computing Machinery, New York, NY, USA, 1997–2018. https://doi.org/10.1145/3563657.3595961
- John Joon Young Chung and Eytan Adar. 2023b. PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions. https://doi.org/10.1145/3586183.3606777 arXiv:2308.05184 [cs]
- TaleBrush: Sketching Stories with Generative Pretrained Language Models. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–19. https://doi.org/10.1145/3491102.3501819
- Bob Coyne and Richard Sproat. 2001. WordsEye: An Automatic Text-to-Scene Conversion System. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’01). Association for Computing Machinery, New York, NY, USA, 487–496. https://doi.org/10.1145/383259.383316
- WorldSmith: Iterative and Expressive Prompting for World Building with a Generative AI. arXiv:2308.13355 [cs]
- GANSlider: How Users Control Generative Models for Images Using Multiple Sliders with and without Feedforward Information. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3491102.3502141
- ”Yours is better!”: participant response bias in HCI. In CHI Conference on Human Factors in Computing Systems, CHI ’12, Austin, TX, USA - May 05 - 10, 2012, Joseph A. Konstan, Ed H. Chi, and Kristina Höök (Eds.). ACM, 1321–1330. https://doi.org/10.1145/2207676.2208589
- Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 10303–10311. https://doi.org/10.1109/ICCV.2019.01040
- Zoom Exhaustion & Fatigue Scale. Computers in Human Behavior Reports 4 (2021), 100119. https://doi.org/10.1016/j.chbr.2021.100119
- Video Play: Playful Interactions in Video Conferencing for Long-Distance Families with Young Children. In Proceedings of the 9th International Conference on Interaction Design and Children. ACM, Barcelona Spain, 49–58. https://doi.org/10.1145/1810543.1810550
- Grandparents and Grandchildren Meeting Online: The Role of Material Things in Remote Settings. In CHI ’21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama, Japan, May 8-13, 2021, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.). ACM, 478:1–478:14. https://doi.org/10.1145/3411764.3445191
- Mesh R-CNN. arXiv:1906.02739 [cs]
- Mirrorverse: Live Tailoring of Video Conferencing Interfaces. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco , CA , USA. https://doi.org/10.1145/3586183.3606767
- MirrorBlender: Supporting Hybrid Meetings with a Malleable Video-Conferencing System. In CHI ’21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama, Japan, May 8-13, 2021, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.). ACM, 451:1–451:13. https://doi.org/10.1145/3411764.3445698
- Partially Blended Realities: Aligning Dissimilar Spaces for Distributed Mixed Reality Meetings. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg, Germany, April 23-28, 2023, Albrecht Schmidt, Kaisa Väänänen, Tesh Goyal, Per Ola Kristensson, Anicia Peters, Stefanie Mueller, Julie R. Williamson, and Max L. Wilson (Eds.). ACM, 456:1–456:16. https://doi.org/10.1145/3544548.3581515
- XSpace: An Augmented Reality Toolkit for Enabling Spatially-Aware Distributed Collaboration. Proc. ACM Hum. Comput. Interact. 6, ISS (2022), 277–302. https://doi.org/10.1145/3567721
- OpenMic: Utilizing Proxemic Metaphors for Conversational Floor Transitions in Multiparty Video Meetings. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg, Germany, April 23-28, 2023, Albrecht Schmidt, Kaisa Väänänen, Tesh Goyal, Per Ola Kristensson, Anicia Peters, Stefanie Mueller, Julie R. Williamson, and Max L. Wilson (Eds.). ACM, 793:1–793:17. https://doi.org/10.1145/3544548.3581013
- ThingShare: Ad-Hoc Digital Copies of Physical Objects for Sharing Things in Video Meetings. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg, Germany, April 23-28, 2023, Albrecht Schmidt, Kaisa Väänänen, Tesh Goyal, Per Ola Kristensson, Anicia Peters, Stefanie Mueller, Julie R. Williamson, and Max L. Wilson (Eds.). ACM, 365:1–365:22. https://doi.org/10.1145/3544548.3581148
- WaaZam!: supporting creative play at a distance in customized video environments. In CHI Conference on Human Factors in Computing Systems, CHI’14, Toronto, ON, Canada - April 26 - May 01, 2014, Matt Jones, Philippe A. Palanque, Albrecht Schmidt, and Tovi Grossman (Eds.). ACM, 1197–1206. https://doi.org/10.1145/2556288.2557382
- Spatialized Audio and Hybrid Video Conferencing: Where Should Voices be Positioned for People in the Room and Remote Headset Users?. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg, Germany, April 23-28, 2023, Albrecht Schmidt, Kaisa Väänänen, Tesh Goyal, Per Ola Kristensson, Anicia Peters, Stefanie Mueller, Julie R. Williamson, and Max L. Wilson (Eds.). ACM, 794:1–794:14. https://doi.org/10.1145/3544548.3581085
- Heewoo Jun and Alex Nichol. 2023. Shap-E: Generating Conditional 3D Implicit Functions. https://doi.org/10.48550/arXiv.2305.02463 arXiv:2305.02463 [cs]
- IllumiShare: sharing any surface. In CHI Conference on Human Factors in Computing Systems, CHI ’12, Austin, TX, USA - May 05 - 10, 2012, Joseph A. Konstan, Ed H. Chi, and Kristina Höök (Eds.). ACM, 1919–1928. https://doi.org/10.1145/2207676.2208333
- HOLODIFFUSION: Training a 3D Diffusion Model Using 2D Images. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Vancouver, BC, Canada, 18423–18433. https://doi.org/10.1109/CVPR52729.2023.01767
- Simple and Effective Synthesis of Indoor 3D Scenes. https://doi.org/10.48550/arXiv.2204.02960 arXiv:2204.02960 [cs]
- Loki: Facilitating Remote Instruction of Physical Tasks Using Bi-Directional Mixed-Reality Telepresence. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, UIST 2019, New Orleans, LA, USA, October 20-23, 2019, François Guimbretière, Michael S. Bernstein, and Katharina Reinecke (Eds.). ACM, 161–174. https://doi.org/10.1145/3332165.3347872
- Toward Video-Conferencing Tools for Hands-On Activities in Online Teaching. Proc. ACM Hum. Comput. Interact. 6, GROUP (2022), 10:1–10:22. https://doi.org/10.1145/3492829
- Evaluation Strategies for HCI Toolkit Research. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April 21-26, 2018. ACM, 36. https://doi.org/10.1145/3173574.3173610
- Distracting Moments in Videoconferencing: A Look Back at the Pandemic Period. In CHI ’22: CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April 2022 - 5 May 2022, Simone D. J. Barbosa, Cliff Lampe, Caroline Appert, David A. Shamma, Steven Mark Drucker, Julie R. Williamson, and Koji Yatani (Eds.). ACM, 141:1–141:21. https://doi.org/10.1145/3491102.3517545
- GLIGEN: Open-Set Grounded Text-to-Image Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22511–22521.
- Opal: Multimodal Image Generation for News Illustration. In The 35th Annual ACM Symposium on User Interface Software and Technology, UIST 2022, Bend, OR, USA, 29 October 2022 - 2 November 2022. ACM, 73:1–73:17. https://doi.org/10.1145/3526113.3545621
- Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. https://doi.org/10.48550/arXiv.1411.1784 arXiv:1411.1784 [cs, stat]
- Osamu Morikawa and Takanori Maesako. 1998. HyperMirror: Toward Pleasant-to-Use Video Mediated Communication System. In CSCW ’98, Proceedings of the ACM 1998 Conference on Computer Supported Cooperative Work, Seattle, WA, USA, November 14-18, 1998, Steven E. Poltrock and Jonathan Grudin (Eds.). ACM, 149–158. https://doi.org/10.1145/289444.289489
- Ubiq-Genie: Leveraging External Frameworks for Enhanced Social VR Experiences. In 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW). IEEE, Shanghai, China, 497–501. https://doi.org/10.1109/VRW58643.2023.00108
- Blended interaction spaces for distributed team collaboration. ACM Trans. Comput. Hum. Interact. 18, 1 (2011), 3:1–3:28. https://doi.org/10.1145/1959022.1959025
- Ayoola Olafenwa. 2021. Simplifying Object Segmentation with PixelLib Library. (Jan. 2021).
- OpenAI. 2023. GPT-4 Technical Report. https://doi.org/10.48550/arXiv.2303.08774 arXiv:2303.08774 [cs]
- Room2Room: Enabling Life-Size Telepresence in a Projected Augmented Reality Environment. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, CSCW 2016, San Francisco, CA, USA, February 27 - March 2, 2016, Darren Gergle, Meredith Ringel Morris, Pernille Bjørn, and Joseph A. Konstan (Eds.). ACM, 1714–1723. https://doi.org/10.1145/2818048.2819965
- Improving Language Understanding by Generative Pre-Training.
- High-Resolution Image Synthesis with Latent Diffusion Models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 10674–10685. https://doi.org/10.1109/CVPR52688.2022.01042
- Raymond Scupin. 1997. The KJ Method: A Technique for Analyzing Data Derived from Japanese Ethnology. Human Organization 56, 2 (1997), 233–237.
- ChatPainter: Improving Text to Image Generation Using Dialogue. https://doi.org/10.48550/arXiv.1802.08216 arXiv:1802.08216 [cs]
- Oasis: Procedurally Generated Social Virtual Spaces from 3D Scanned Real Spaces. IEEE Transactions on Visualization and Computer Graphics 24, 12 (Dec. 2018), 3174–3187. https://doi.org/10.1109/TVCG.2017.2762691
- Perspectives: Creating Inclusive and Equitable Hybrid Meeting Experiences. Proceedings of the ACM on Human-Computer Interaction 7, CSCW2 (Oct. 2023).
- Philip Tuddenham and Peter Robinson. 2009. Territorial coordination and workspace awareness in remote tabletop collaboration. In Proceedings of the 27th International Conference on Human Factors in Computing Systems, CHI 2009, Boston, MA, USA, April 4-9, 2009. ACM, 2139–2148. https://doi.org/10.1145/1518701.1519026
- Wish you were here: being together through composite video and digital keepsakes. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services, MobileHCI 2018, Barcelona, Spain, September 03-06, 2018, Lynne Baillie and Nuria Oliver (Eds.). ACM, 17:1–17:11. https://doi.org/10.1145/3229434.3229476
- Spacetime: Enabling Fluid Individual and Collaborative Editing in Virtual Reality. In The 31st Annual ACM Symposium on User Interface Software and Technology, UIST 2018, Berlin, Germany, October 14-17, 2018. ACM, 853–866. https://doi.org/10.1145/3242587.3242597
- DreamWalker: Substituting Real-World Walking Experiences with a Virtual Reality. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (UIST ’19). Association for Computing Machinery, New York, NY, USA, 1093–1107. https://doi.org/10.1145/3332165.3347875
- Free-Form Image Inpainting With Gated Convolution. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 4470–4479. https://doi.org/10.1109/ICCV.2019.00457
- Tabletop Games in the Age of Remote Collaboration: Design Opportunities for a Socially Connected Game Experience. In CHI ’21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama, Japan, May 8-13, 2021, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.). ACM, 436:1–436:14. https://doi.org/10.1145/3411764.3445512
- When Tablets meet Tabletops: The Effect of Tabletop Size on Around-the-Table Collaboration with Personal Tablets. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, May 7-12, 2016. ACM, 5470–5481. https://doi.org/10.1145/2858036.2858224
- VRGit: A Version Control System for Collaborative Content Creation in Virtual Reality. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg, Germany, April 23-28, 2023. ACM, 36:1–36:14. https://doi.org/10.1145/3544548.3581136
- Adding Conditional Control to Text-to-Image Diffusion Models. https://doi.org/10.48550/arXiv.2302.05543 arXiv:2302.05543 [cs]
- Real-Time User-Guided Image Colorization with Learned Deep Priors. ACM Transactions on Graphics 36, 4 (July 2017), 119:1–119:11. https://doi.org/10.1145/3072959.3073703