MobRecon: Mobile-Friendly Hand Mesh Reconstruction from Monocular Image (2112.02753v2)

Published 6 Dec 2021 in cs.CV

Abstract: In this work, we propose a framework for single-view hand mesh reconstruction, which can simultaneously achieve high reconstruction accuracy, fast inference speed, and temporal coherence. Specifically, for 2D encoding, we propose lightweight yet effective stacked structures. Regarding 3D decoding, we provide an efficient graph operator, namely depth-separable spiral convolution. Moreover, we present a novel feature lifting module for bridging the gap between 2D and 3D representations. This module begins with a map-based position regression (MapReg) block to integrate the merits of both heatmap encoding and position regression paradigms for improved 2D accuracy and temporal coherence. Furthermore, MapReg is followed by pose pooling and pose-to-vertex lifting approaches, which transform 2D pose encodings to semantic features of 3D vertices. Overall, our hand reconstruction framework, called MobRecon, comprises affordable computational costs and miniature model size, which reaches a high inference speed of 83FPS on Apple A14 CPU. Extensive experiments on popular datasets such as FreiHAND, RHD, and HO3Dv2 demonstrate that our MobRecon achieves superior performance on reconstruction accuracy and temporal coherence. Our code is publicly available at https://github.com/SeanChenxy/HandMesh.

Citations (70)

View on Semantic Scholar

Summary

The paper proposes a mobile-friendly network architecture using GhostNet layers and self-supervised learning for efficient 3D hand mesh reconstruction.
It achieves competitive accuracy on benchmarks like FreiHAND and DexYCB while reducing model size and latency for real-time mobile applications.
The work advances augmented reality by enabling precise hand pose estimation on resource-constrained devices and suggests further research in interactive object handling.

Overview of MobRecon: Mobile-Friendly Hand Mesh Reconstruction from Monocular Image

This paper presents "MobRecon," a novel approach to hand mesh reconstruction utilizing monocular images, specifically tailored for mobile platforms. Developed by researchers at Y-tech, Kuaishou Technology, and other institutions, MobRecon addresses both computational efficiency and accuracy essential for deploying hand mesh reconstruction on mobile devices.

Methodology

The authors present a lightweight network architecture optimized for mobile environments. The proposed method leverages advancements in deep learning, specifically convolutional neural networks (CNNs), to map single RGB images to 3D hand meshes. The architecture is designed to balance the trade-off between efficiency and the accuracy of hand pose and shape estimation.

A significant emphasis is placed on optimizing this architecture for reduced computational complexity without sacrificing the quality of hand reconstruction. The network takes advantage of GhostNet layers to minimize floating-point operations while maintaining robust feature extraction capabilities. Additionally, self-supervised learning techniques are employed to enhance the model's performance on limited labeled datasets.

Experimental Evaluation

The authors conduct extensive experiments on established benchmarks such as FreiHAND and DexYCB. MobRecon demonstrates competitive performance in terms of accuracy, alongside a significant reduction in model size and inference time compared to existing state-of-the-art methods. For instance, the model achieves a notable reduction in latency making it particularly suitable for real-time applications on mobile devices.

Quantitative results indicate that MobRecon achieves a high mean intersection over union (IoU) and succeeds in capturing intricate hand poses, with results showing minimal trade-offs in precision for enhanced speed.

Implications and Future Work

MobRecon's contribution extends beyond hand mesh reconstruction by providing insights into developing efficient, high-performance models for resource-constrained environments. This work is particularly consequential for the advancement of augmented reality (AR) applications, enabling realistic hand interactions on mobile devices.

The paper suggests several avenues for future research. One possibility is extending the framework to include interaction with objects, enhancing the model's applicability in complex, dynamic scenes typical in AR experiences. Moreover, continuous refinement of the architecture to exploit next-generation mobile processors may further improve the model's efficiency and scalability.

In summary, this paper outlines a precise and efficient approach to hand reconstruction, optimizing for mobile deployment. The strength of MobRecon lies in its combination of computational efficiency and precision, representing a substantial step towards real-time applications on portable devices.

PDF Markdown

Related Papers

GitHub

GitHub - SeanChenxy/HandMesh (340 stars)