Fast Registration of Photorealistic Avatars for VR Facial Animation (2401.11002v2)
Abstract: Virtual Reality (VR) bares promise of social interactions that can feel more immersive than other media. Key to this is the ability to accurately animate a personalized photorealistic avatar, and hence the acquisition of the labels for headset-mounted camera (HMC) images need to be efficient and accurate, while wearing a VR headset. This is challenging due to oblique camera views and differences in image modality. In this work, we first show that the domain gap between the avatar and HMC images is one of the primary sources of difficulty, where a transformer-based architecture achieves high accuracy on domain-consistent data, but degrades when the domain-gap is re-introduced. Building on this finding, we propose a system split into two parts: an iterative refinement module that takes in-domain inputs, and a generic avatar-guided image-to-image domain transfer module conditioned on current estimates. These two modules reinforce each other: domain transfer becomes easier when close-to-groundtruth examples are shown, and better domain-gap removal in turn improves the registration. Our system obviates the need for costly offline optimization, and produces online registration of higher quality than direct regression method. We validate the accuracy and efficiency of our approach through extensive experiments on a commodity headset, demonstrating significant improvements over these baselines. To stimulate further research in this direction, we make our large-scale dataset and code publicly available.
- Artflow: Unbiased image style transfer via reversible neural flows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 862–871, 2021.
- Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18392–18402, 2023.
- How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.
- Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans. Graph., 33(4), 2014.
- Authentic volumetric avatars from a phone scan. ACM Trans. Graph., 41(4), 2022.
- Artistic style transfer with internal-external learning and contrastive learning. Advances in Neural Information Processing Systems, 34:26561–26573, 2021.
- Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
- Stytr22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Image style transfer with transformers. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Cascaded pose regression. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1078–1085, 2010.
- An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929, 2020.
- Image style transfer using convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324, 2019.
- Image-to-image translation with conditional adversarial networks. CVPR, 2017.
- One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
- Big transfer (bit): General visual representation learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pages 491–507. Springer, 2020.
- Facial performance sensing head-mounted display. ACM Transactions on Graphics (TOG), 34(4):47:1–47:9, 2015.
- On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265, 2019.
- Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6649–6658, 2021.
- Deep appearance models for face rendering. ACM Trans. Graph., 37(4):68:1–68:13, 2018.
- Neural volumes: Learning dynamic renderable volumes from images. arXiv preprint arXiv:1906.07751, 2019.
- Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph., 40(4), 2021.
- Meta Inc. Meta Quest Pro: Premium Mixed Reality. https://www.meta.com/ie/quest/quest-pro/, 2023.
- High-fidelity facial and speech animation for vr hmds. ACM Transactions on Graphics (TOG), 35(6):1–14, 2016.
- Stand-alone self-attention in vision models. 2019.
- Face alignment at 3000 fps via regressing local binary features. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 1685–1692, 2014.
- Iterative error bound minimisation for aam alignment. In Proceedings of the 18th International Conference on Pattern Recognition - Volume 02, page 1196–1195, USA, 2006. IEEE Computer Society.
- The eyes have it: An integrated eye and face model for photorealistic facial animation. ACM Trans. Graph., 39(4), 2020.
- Textured neural avatars. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2382–2392, 2019.
- Facevr: Real-time gaze-aware facial reenactment in virtual reality. ACM Transactions on Graphics (TOG), 37(2):25:1–25:15, 2018.
- Vr facial animation via multiview image translation. ACM Trans. Graph., 38(4), 2019.
- Styleformer: Real-time arbitrary style transfer via parametric style composition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14618–14627, 2021.
- Sparse local patch transformer for robust face alignment and landmarks inherent relation learning. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4042–4051, 2022.
- Xuehan Xiong and Fernando De la Torre. Supervised descent method and its applications to face alignment. 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 532–539, 2013.
- On the continuity of rotation representations in neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision (ICCV), 2017.