BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image (2403.08262v4)
Abstract: Creating personalized hand avatars is important to offer a realistic experience to users on AR / VR platforms. While most prior studies focused on reconstructing 3D hand shapes, some recent work has tackled the reconstruction of hand textures on top of shapes. However, these methods are often limited to capturing pixels on the visible side of a hand, requiring diverse views of the hand in a video or multiple images as input. In this paper, we propose a novel method, BiTT(Bi-directional Texture reconstruction of Two hands), which is the first end-to-end trainable method for relightable, pose-free texture reconstruction of two interacting hands taking only a single RGB image, by three novel components: 1) bi-directional (left $\leftrightarrow$ right) texture reconstruction using the texture symmetry of left / right hands, 2) utilizing a texture parametric model for hand texture recovery, and 3) the overall coarse-to-fine stage pipeline for reconstructing personalized texture of two interacting hands. BiTT first estimates the scene light condition and albedo image from an input image, then reconstructs the texture of both hands through the texture parametric model and bi-directional texture reconstructor. In experiments using InterHand2.6M and RGB2Hands datasets, our method significantly outperforms state-of-the-art hand texture reconstruction methods quantitatively and qualitatively. The code is available at https://github.com/yunminjin2/BiTT
- Photorealistic monocular 3d reconstruction of humans wearing clothing. In CVPR, 2022.
- Measuring generalisation to unseen viewpoints, articulations, shapes and objects for 3d hand pose estimation under hand-object interaction. In ECCV, 2020.
- Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In CVPR, 2019.
- Weakly-supervised domain adaptation via gan and mesh model for estimating 3d hand poses interacting objects. In CVPR, 2020.
- Hand avatar: Free-pose hand animation and rendering from monocular video. In CVPR, 2023.
- Model-based 3d hand reconstruction via self-supervised learning. In CVPR, 2021.
- Lisa: Learning implicit shape and appearance of hands. In CVPR, 2022.
- First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In CVPR, 2018.
- Shape and viewpoints without keypoints. In ECCV, 2020.
- Handnerf: Neural radiance fields for animatable interacting hands. In CVPR, 2023.
- Self-supervised 3d mesh reconstruction from single images. In CVPR, 2021.
- Hvtr: Hybrid volumetric-textural rendering for human avatars. In 3DV, 2022.
- A probabilistic attention model with occlusion-aware texture regression for 3d hand reconstruction from a single rgb image. In CVPR, 2023.
- Learning category-specific mesh reconstruction from image collections. In ECCV, 2018.
- Grasping field: Learning implicit representations for human grasps. In 3DV, 2020.
- A skeleton-driven neural occupancy representation for articulated hands. In 3DV, 2021.
- Harp: Personalized hand reconstruction from a monocular rgb video. In CVPR, 2023.
- Fourierhandflow: Neural 4d hand representation using fourier query flow. In NIPS, 2023a.
- Im2hands: Learning attentive implicit representation of interacting two-hand shapes. In CVPR, 2023b.
- A hierarchical representation network for accurate and detailed face reconstruction from in-the-wild images. In CVPR, 2023.
- Interacting attention graph for single image two-hand reconstruction. In CVPR, 2022a.
- Tava: Template-free animatable volumetric actors. In ECCV, 2022b.
- Self-supervised single-view 3d reconstruction via semantic consistency. In ECCV, 2020.
- Nimble: A non-rigid hand model with bones and muscles. ACM TOG, 2022c.
- Otavatar: One-shot talking face avatar with controllable tri-plane rendering. In CVPR, 2023.
- Realfusion: 360 reconstruction of any object from a single image. In CVPR, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Interhand2.6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In ECCV, 2020.
- Livehand: Real-time and photorealistic neural hand rendering. In ICCV, 2023.
- Neural articulated radiance field. In ICCV, 2021.
- Texture fields: Learning texture representations in function space. In ICCV, 2019.
- Nerfies: Deformable neural radiance fields. In ICCV, 2021.
- Bui Tuong Phong. Illumination for Computer Generated Pictures. Association for Computing Machinery, 1998.
- HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization. In ECCV, 2020.
- High-resolution image synthesis with latent diffusion models. arxiv:2112.10752, 2021.
- Embodied hands: Modeling and capturing hands and bodies together. ACM TOG, 2017.
- Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In ICCV, 2019.
- Sfsnet: Learning shape, refectance and illuminance of faces in the wild. In CVPR, 2018.
- Anything-3d: Towards single-view anything reconstruction in the wild. arXiv:2304.10261, 2023.
- Rgb2hands: Real-time tracking of 3d hand interactions from monocular rgb video. ACM TOG, 2020.
- Image quality assessment: from error visibility to structural similarity. In IEEE TIP, 2004a.
- Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004b.
- De-rendering the world’s revolutionary artefacts. In CVPR, 2021.
- ICON: Implicit Clothed humans Obtained from Normals. In CVPR, 2022.
- Shelf-supervised mesh prediction in the wild. In CVPR, 2021.
- What’s in your hands? 3d reconstruction of generic objects in hands. In CVPR, 2022.
- pixelnerf: Neural radiance fields from one or few images. In CVPR, 2021.
- Graphics capsule: Learning hierarchical 3d face representations from 2d images. In CVPR, 2023a.
- Acr: Attention collaboration-based regressor for arbitrary two-hand reconstruction. In CVPR, 2023b.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.