Sophia-in-Audition: Virtual Production with a Robot Performer (2402.06978v1)
Abstract: We present Sophia-in-Audition (SiA), a new frontier in virtual production, by employing the humanoid robot Sophia within an UltraStage environment composed of a controllable lighting dome coupled with multiple cameras. We demonstrate Sophia's capability to replicate iconic film segments, follow real performers, and perform a variety of motions and expressions, showcasing her versatility as a virtual actor. Key to this process is the integration of facial motion transfer algorithms and the UltraStage's controllable lighting and multi-camera setup, enabling dynamic performances that align with the director's vision. Our comprehensive user studies indicate positive audience reception towards Sophia's performances, highlighting her potential to reduce the uncanny valley effect in virtual acting. Additionally, the immersive lighting in dynamic clips was highly rated for its naturalness and its ability to mirror professional film standards. The paper presents a first-of-its-kind multi-view robot performance video dataset with dynamic lighting, offering valuable insights for future enhancements in humanoid robotic performers and virtual production techniques. This research contributes significantly to the field by presenting a unique virtual production setup, developing tools for sophisticated performance control, and providing a comprehensive dataset and user study analysis for diverse applications.
- Agisoft. Agisoft metashape 2.1.0, 2023.
- Luma AI. Luma, 2023.
- Cuarón Alfonso. Gravity, 2013.
- Apple. Apple arkit. https://developer.apple.com/augmented-reality/arkit/, 2024.
- Franz Aurenhammer. Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Computing Surveys (CSUR), 23(3):345–405, 1991.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
- Markerless motion capture using multiple color-depth sensors. In VMV, pages 317–324, 2011.
- Deep relightable appearance models for animatable faces. ACM Transactions on Graphics (TOG), 40(4):1–15, 2021.
- Multi-scale capture of facial geometry and motion. ACM transactions on graphics (TOG), 26(3):33–es, 2007.
- Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127, 2023.
- Evgeniy Bryndin. Human digital doubles with technological cognitive thinking and adaptive behaviour. Software Engineering, 7(1):1–9, 2019.
- Relighting human locomotion with flowed reflectance fields. In ACM SIGGRAPH 2006 Sketches, pages 76–es. 2006.
- Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, pages 333–350. Springer, 2022.
- Chazelle Damien. La la land, 2016.
- Facial motion capture with 3d active appearance models. In 2013 3rd International Conference on Instrumentation, Communications, Information Technology and Biomedical Engineering (ICICI-BME), pages 59–64. IEEE, 2013.
- Fincher David. The curious case of benjamin button, 2008.
- Paul Debevec. The light stages and their applications to photoreal digital actors. SIGGRAPH Asia, 2(4):1–6, 2012.
- Acquiring the reflectance field of a human face. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 145–156, 2000.
- A lighting reproduction approach to live-action compositing. ACM Transactions on Graphics (TOG), 21(3):547–556, 2002.
- Deepfakes. Deepfakes. https://github.com/deepfakes/faceswap, 2017.
- A review of 3d human pose estimation algorithms for markerless motion capture. Computer Vision and Image Understanding, 212:103275, 2021.
- Steve Dixon. The digital double. In New Visions In Performance, pages 13–30. Routledge, 2005.
- Geminoid DK. Geminoid dk. https://robots.ieee.org/robots/geminoiddk/, 2011.
- Erica. Erica. https://robotsguide.com/robots/erica/, 2015.
- Learning an animatable detailed 3d face model from in-the-wild images. ACM Transactions on Graphics (ToG), 40(4):1–13, 2021.
- Ford Coppola Francis. The godfather, 1972.
- Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
- Robin Edward Gearing. Bracketing in research: A typology. Qualitative health research, 14(10):1429–1452, 2004.
- Johnstone Gerard. M3gan, 2022.
- Gourieff. Reactor for stable diffusion. https://github.com/Gourieff/sd-webui-reactor, 2023.
- Animatediff: Animate your personalized text-to-image diffusion models without specific tuning. arXiv preprint arXiv:2307.04725, 2023.
- Robotics Hanson. Hanson robotics ai api sdk. http://docs.hr-tools.io/, 2021.
- A photometric approach to digitizing cultural artifacts. In Proceedings of the 2001 conference on Virtual reality, archeology, and cultural heritage, pages 333–342, 2001.
- HRP-4C. Hrp-4c. https://robots.ieee.org/robots/hrp4c/, 2009.
- Lora: Low-rank adaptation of large language models, 2021.
- Humannorm: Learning normal diffusion model for high-quality and realistic 3d human generation. arXiv preprint arXiv:2310.01406, 2023.
- Ondrej Jamriska. Ebsynth: Fast example-based image synthesis and style transfer. https://github.com/jamriska/ebsynth, 2018.
- The mandalorians, 2019.
- Learning controls for blend shape based realistic facial animation. In ACM SIGGRAPH 2006 Courses, SIGGRAPH ’06, page 17–es, New York, NY, USA, 2006. Association for Computing Machinery.
- Taymor Julie. Across the universe, 2007.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023.
- Markerless motion capture systems as training device in neurological rehabilitation: a systematic review of their use, application, target population and efficacy. Journal of neuroengineering and rehabilitation, 14:1–11, 2017.
- Practical multispectral lighting reproduction. ACM Transactions on Graphics (TOG), 35(4):1–11, 2016.
- Mirror mocap: Automatic and efficient capture of dense 3d facial motion parameters from video. The Visual Computer, 21:355–372, 2005.
- Real-time high-resolution background matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8762–8771, 2021.
- Zero-1-to-3: Zero-shot one image to 3d object, 2023.
- Lcm-lora: A universal stable-diffusion acceleration module, 2023.
- Brest Martin. Scent of a woman, 1992.
- Scorsese Martin. The irishman, 2019.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
- Evaluation of 3d markerless motion capture accuracy using openpose with multiple video cameras. Frontiers in sports and active living, 2:50, 2020.
- Design of android type humanoid robot albert hubo. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1428–1433. IEEE, 2006.
- OpenAI. Chatgpt: Optimizing language models for dialogue. https://openai.com/chatgpt/, 2023. Accessed: 2024-01-23.
- Drag your gan: Interactive point-based manipulation on the generative image manifold. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023.
- Total relighting: learning to relight portraits for background replacement. ACM Transactions on Graphics (TOG), 40(4):1–21, 2021.
- Hdr lighting dilation for dynamic range reduction on virtual production stages, 2022.
- Post-production facial performance relighting using reflectance transfer. ACM Transactions on Graphics (TOG), 26(3):52–es, 2007.
- Deepfacelab: Integrated, flexible and extensible face-swapping framework, 2021.
- Physically based rendering: From theory to implementation. MIT Press, 2023.
- Diffusionlight: Light probes for free by painting a chrome ball. In ArXiv, 2023.
- Difareli: Diffusion face relighting. arXiv preprint arXiv:2304.09479, 2023.
- Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence, 44(3):1623–1637, 2020.
- Humor: 3d human motion model for robust pose estimation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11488–11499, 2021.
- Wise Robert. The sound of music, 1965.
- Hanson Robotics. Sophia. https://www.hansonrobotics.com/sophia-2020/, 2016.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. 2022.
- Meyer Russ. Beyond the valley of the dolls, 1970.
- Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2304–2314, 2019.
- Virtual Production Studios. Virtual production stage. https://www.virtualproductionstudios.com/virtual-production-stage/, 2021.
- Light stage super-resolution: continuous high-frequency relighting. ACM Transactions on Graphics (TOG), 39(6):1–12, 2020.
- Gilliam Terry. The zero theorem, 2013.
- Stylerig: Rigging stylegan for 3d control over portrait images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6142–6151, 2020.
- Bracketing in qualitative research. Qualitative social work, 11(1):80–96, 2012.
- Zarins Uldis. Anatomy of Facial Expressions. Anatomy Next, Inc., 2017.
- Eline Van der Kruk and Marco M Reijne. Accuracy of human motion capture systems for sport applications; state-of-the-art review. European journal of sport science, 18(6):806–819, 2018.
- Vicon. Vicon, 2023.
- Vicon. Vicon carapost, 2023.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
- Esrgan: Enhanced super-resolution generative adversarial networks. In Laura Leal-Taixé and Stefan Roth, editors, Computer Vision – ECCV 2018 Workshops, pages 63–79, Cham, 2019. Springer International Publishing.
- Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490, 2022.
- Performance relighting and reflectance transformation with time-multiplexed illumination. ACM Transactions on Graphics (TOG), 24(3):756–764, 2005.
- Anderson Wes. The grand budapest hotel, 2014.
- Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. 2023.
- Cle diffusion: Controllable light enhancement diffusion model. In Proceedings of the 31st ACM International Conference on Multimedia, pages 8145–8156, 2023.
- Lanthimos Yorgos. Poor things, 2023.
- Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021.
- pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021.
- Resshift: Efficient diffusion model for image super-resolution by residual shifting. arXiv preprint arXiv:2307.12348, 2023.
- Adding conditional control to text-to-image diffusion models, 2023.
- Neural video portrait relighting in real-time via consistency modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 802–812, 2021.
- Relightable neural human assets from multi-view gradient illuminations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4315–4327, 2023.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.