Expression-aware video inpainting for HMD removal in XR applications (2401.14136v1)
Abstract: Head-mounted displays (HMDs) serve as indispensable devices for observing extended reality (XR) environments and virtual content. However, HMDs present an obstacle to external recording techniques as they block the upper face of the user. This limitation significantly affects social XR applications, specifically teleconferencing, where facial features and eye gaze information play a vital role in creating an immersive user experience. In this study, we propose a new network for expression-aware video inpainting for HMD removal (EVI-HRnet) based on generative adversarial networks (GANs). Our model effectively fills in missing information with regard to facial landmarks and a single occlusion-free reference image of the user. The framework and its components ensure the preservation of the user's identity across frames using the reference frame. To further improve the level of realism of the inpainted output, we introduce a novel facial expression recognition (FER) loss function for emotion preservation. Our results demonstrate the remarkable capability of the proposed framework to remove HMDs from facial videos while maintaining the subject's facial expression and identity. Moreover, the outputs exhibit temporal consistency along the inpainted frames. This lightweight framework presents a practical approach for HMD occlusion removal, with the potential to enhance various collaborative XR applications without the need for additional hardware.
- A systematic review: Virtual-reality-based techniques for human exercises and health improvement. Frontiers in Public Health 11 (2023), 1143947.
- Wasserstein generative adversarial networks. In International conference on machine learning. PMLR, 214–223.
- Free-form video inpainting with 3d gated convolution and temporal patchgan. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9066–9075.
- Learnable gated temporal shift module for deep video inpainting. arXiv preprint arXiv:1907.01131 (2019).
- Multi-scale patch-GAN with edge detection for image inpainting. Applied Intelligence 53, 4 (2023), 3917–3932.
- Real-time 3d face reconstruction and gaze tracking for virtual reality. In 2018 IEEE Conference on virtual reality and 3d user interfaces (VR). IEEE, 525–526.
- 3D face reconstruction and gaze tracking in the HMD for virtual interaction. IEEE Transactions on Multimedia (2022).
- DGCA: high resolution image inpainting via DR-GAN and contextual attention. Multimedia Tools and Applications (2023), 1–21.
- Collecting large, richly annotated facial-expression databases from movies. IEEE multimedia 19, 3 (2012), 34.
- Headset Removal for Virtual and Mixed Reality. In ACM SIGGRAPH 2017 Talks (Los Angeles, California) (SIGGRAPH ’17). Association for Computing Machinery, New York, NY, USA, Article 80, 2 pages. https://doi.org/10.1145/3084363.3085083
- A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015).
- Generative adversarial nets. Advances in neural information processing systems 27 (2014).
- Supervision by Landmarks: An Enhanced Facial De-occlusion Network for VR-Based Applications. In European Conference on Computer Vision. Springer, 323–337.
- Attention based occlusion removal for hybrid telepresence systems. In 2022 19th Conference on Robots and Vision (CRV). IEEE, 167–174.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).
- Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1867–1874.
- Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755–1758.
- Prior guided gan based semantic inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13696–13705.
- Multiparty Holomeetings: Toward a New Era of Low-Cost Volumetric Holographic Meetings in Virtual Reality. IEEE Access 10 (2022), 81856–81876. https://doi.org/10.1109/ACCESS.2022.3196285
- Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF international conference on computer vision. 7083–7093.
- Realistic facial expression reconstruction for VR HMD users. IEEE Transactions on Multimedia 22, 3 (2019), 730–743.
- Nels Numan. 2020. Generative RGB-D Face Completion for Head-Mounted Display Removal. http://resolver.tudelft.nl/uuid:ed0680e5-ea0c-49e3-a1a1-be9de8773188
- Generative RGB-D face completion for head-mounted display removal. In 2021 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW). IEEE, 109–116.
- TogetherVR: A framework for photorealistic shared media experiences in 360-degree VR. SMPTE Motion Imaging Journal 127, 7 (2018), 39–44.
- Systematic literature review and bibliometric analysis on virtual reality and education. Education and Information Technologies 28, 1 (2023), 155–192.
- Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv preprint arXiv:1803.09179 (2018).
- Imagenet large scale visual recognition challenge. International journal of computer vision 115 (2015), 211–252.
- Andrey V Savchenko. 2022. Video-based frame-level facial analysis of affective behavior on mobile devices using EfficientNets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2359–2366.
- Intelligence at play: game-based assessment using a virtual-reality application. Virtual Reality (2023), 1–17.
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
- Ryan Szeto and Jason J Corso. 2022. The devil is in the details: A diagnostic evaluation benchmark for video inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21054–21063.
- Facevr: Real-time facial reenactment and eye gaze control in virtual reality. arXiv preprint arXiv:1610.03151 (2016).
- Deep Learning Augmented Realistic Avatars for Social VR Human Representation. In ACM International Conference on Interactive Media Experiences. 311–318.
- Video inpainting by jointly learning temporal structure and spatial details. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5232–5239.
- Faithful face image completion for HMD occlusion removal. In 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, 251–256.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
- From image to video face inpainting: spatial-temporal nested GAN (STN-GAN) for usability recovery. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2396–2405.
- Deep Face Video Inpainting via UV Mapping. IEEE Transactions on Image Processing 32 (2023), 1145–1157.
- Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF international conference on computer vision. 4471–4480.
- Self-attention generative adversarial networks. In International conference on machine learning. PMLR, 7354–7363.
- Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
- Identity preserving face completion for large ocular region occlusion. arXiv preprint arXiv:1807.08772 (2018).
- Progressive temporal feature alignment network for video inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16448–16457.
- Fatemeh Ghorbani Lohesara (3 papers)
- Karen Egiazarian (24 papers)
- Sebastian Knorr (6 papers)