Headset: Human emotion awareness under partial occlusions multimodal dataset (2402.09107v1)
Abstract: The volumetric representation of human interactions is one of the fundamental domains in the development of immersive media productions and telecommunication applications. Particularly in the context of the rapid advancement of Extended Reality (XR) applications, this volumetric data has proven to be an essential technology for future XR elaboration. In this work, we present a new multimodal database to help advance the development of immersive technologies. Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs). The recording system consists of a volumetric capture (VoCap) studio, including 31 synchronized modules with 62 RGB cameras and 31 depth cameras. In addition to textured meshes, point clouds, and multi-view RGB-D data, we use one Lytro Illum camera for providing light field (LF) data simultaneously. Finally, we also provide an evaluation of our dataset employment with regard to the tasks of facial expression classification, HMDs removal, and point cloud reconstruction. The dataset can be helpful in the evaluation and performance testing of various XR algorithms, including but not limited to facial expression recognition and reconstruction, facial reenactment, and volumetric video. HEADSET and its all associated raw data and license agreement will be publicly available for research purposes.
- An integrated platform for live 3d human reconstruction and motion capturing. IEEE Transactions on Circuits and Systems for Video Technology, 27(4):798–813, 2016.
- Video based reconstruction of 3d people models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8387–8397, 2018.
- Training deep networks for facial expression recognition with crowd-sourced label distribution. In Proceedings of the 18th ACM international conference on multimodal interaction, pp. 279–283, 2016.
- Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 67–74. IEEE, 2018.
- Learnable gated temporal shift module for deep video inpainting. arXiv preprint arXiv:1907.01131, 2019.
- Human4d: A human-centric multimodal dataset for motions and immersive media. IEEE Access, 8:176241–176262, 2020.
- 3d face reconstruction and gaze tracking in the hmd for virtual interaction. IEEE Transactions on Multimedia, 2022.
- Unconstrained kinect video face database. Information Fusion, 44:113–125, 2018.
- V. Chiesa and J.-L. Dugelay. On multi-view face recognition using lytro images. In 2018 26th European Signal Processing Conference (EUSIPCO), pp. 2250–2254. IEEE, 2018.
- METRO: Measuring error on simplified surfaces. Computer Graphics Forum, 17:167 – 174, 06 1998. doi: 10 . 1111/1467-8659 . 00236
- A. Dhall. Emotiw 2019: Automatic emotion, engagement and cohesion prediction tasks. In 2019 International Conference on Multimodal Interaction, pp. 546–550, 2019.
- Fusion4d: Real-time performance capture of challenging scenes. ACM Transactions on Graphics (ToG), 35(4):1–13, 2016.
- Ad-corre: Adaptive correlation-based loss for facial expression recognition in the wild. IEEE Access, 10:26756–26768, 2022.
- Light fields for face analysis. Sensors, 19(12):2687, 2019.
- Morphable face models-an open framework. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 75–82. IEEE, 2018.
- D. Girardeau-Montaut. Cloudcompare. France: EDF R&D Telecom ParisTech, 11, 2016.
- Challenges in representation learning: A report on three machine learning contests. In International Conference on Neural Information Processing, pp. 117–124. Springer, 2013.
- An overview of ongoing point cloud compression standardization activities: video-based (v-pcc) and geometry-based (g-pcc). APSIPA Transactions on Signal and Information Processing, 9:e13, 2020. doi: 10 . 1017/ATSIP . 2020 . 12
- The relightables: Volumetric performance capture of humans with realistic relighting. ACM Transactions on Graphics (ToG), 38(6):1–19, 2019.
- Real-time geometry, albedo, and motion reconstruction using a single rgb-d camera. ACM Transactions on Graphics (ToG), 36(4):1, 2017.
- HTC. Vive pro eye overview.
- 3d human body reconstruction from a single image via volumetric regression. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0, 2018.
- Panoptic studio: A massively multiview system for social motion capture. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3334–3342, 2015.
- Poisson surface reconstruction with envelope constraints. In Computer Graphics Forum, vol. 39, pp. 173–182. Wiley Online Library, 2020.
- 8i voxelized surface light field (8iVSLF) dataset. ISO/IEC JTC1/SC29/WG11 MPEG, input document m42914, 2018.
- C. Kyrlitsias and D. Michael-Grigoriou. Social interaction with agents and avatars in immersive virtual environments: A survey. Frontiers in Virtual Reality, 2:168, 2022.
- Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861, 2017.
- Point cloud quality assessment: Dataset construction and learning-based no-reference metric. ACM Transactions on Multimedia Computing, Communications and Applications, 19(2s):1–26, 2023.
- Realistic facial expression reconstruction for vr hmd users. IEEE Transactions on Multimedia, 22(3):730–743, 2019.
- Coding facial expressions with gabor wavelets. In Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 200–205. IEEE, 1998.
- Kinectfacedb: A kinect database for face recognition. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 44(11):1534–1548, 2014.
- Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 10(1):18–31, 2017.
- Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 343–352, 2015.
- Generative rgb-d face completion for head-mounted display removal. In 2021 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 109–116. IEEE, 2021.
- Holoportation: Virtual 3d teleportation in real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology, pp. 741–754, 2016.
- Volograms & v-sense volumetric video dataset. ISO/IEC JTC1/SC29/WG07 MPEG2021/m56767, 2021.
- Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5865–5874, 2021.
- Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9054–9063, 2021.
- Facial expression recognition using residual masking network. In 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4513–4519. IEEE, 2021.
- Exploring the usefulness of light field cameras for biometrics: An empirical study on face and iris recognition. IEEE Transactions on Information Forensics and Security, 11(5):922–936, 2015.
- Cwipc-sxr: Point cloud dynamic human dataset for social xr. In Proceedings of the 12th ACM Multimedia Systems Conference, pp. 300–306, 2021.
- Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv preprint arXiv:1803.09179, 2018.
- Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 84–93, 2020.
- A. V. Savchenko. Facial expression and attributes recognition based on multi-task learning of lightweight neural networks. In 2021 IEEE 19th International Symposium on Intelligent Systems and Informatics (SISY), pp. 119–124. IEEE, 2021.
- A. V. Savchenko. Video-based frame-level facial analysis of affective behavior on mobile devices using efficientnets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2359–2366, 2022.
- Classifying emotions and engagement in online learning based on a single facial expression recognition neural network. IEEE Transactions on Affective Computing, 2022.
- Emerging MPEG Standards for Point Cloud Compression. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9(1):133–148, 2019. doi: 10 . 1109/JETCAS . 2018 . 2885981
- The ist-eurecom light field face database. In 2017 5th International Workshop on Biometrics and Forensics (IWBF), pp. 1–6. IEEE, 2017.
- Capsfield: light field-based face and expression recognition in the wild using capsule routing. IEEE Transactions on Image Processing, 30:2627–2642, 2021.
- Audio-visual automatic group affect analysis. IEEE Transactions on Affective Computing, 2021.
- Robustfusion: Human volumetric capture with data-driven visual cues using a rgbd camera. In European Conference on Computer Vision, pp. 246–264. Springer, 2020.
- Geometric distortion metrics for point cloud compression. In 2017 IEEE International Conference on Image Processing (ICIP), pp. 3460–3464, 2017. doi: 10 . 1109/ICIP . 2017 . 8296925
- Validating the radboud faces database from a child’s perspective. Cognition and Emotion, 33(8):1531–1547, 2019.
- Faithful face image completion for hmd occlusion removal. In 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 251–256. IEEE, 2019.
- Humbi: A large multiview dataset of human body expressions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2990–3000, 2020.
- Self-attention generative adversarial networks. In International conference on machine learning, pp. 7354–7363. PMLR, 2019.
- Lock3dface: A large-scale database of low-cost kinect 3d faces. In 2016 International Conference on Biometrics (ICB), pp. 1–8. IEEE, 2016.
- Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499–1503, 2016.
- Identity preserving face completion for large ocular region occlusion. arXiv preprint arXiv:1807.08772, 2018.
- A complementary fusion strategy for rgb-d face recognition. In MultiMedia Modeling: 28th International Conference, MMM 2022, Phu Quoc, Vietnam, June 6–10, 2022, Proceedings, Part I, pp. 339–351. Springer, 2022.