Monocular hand pose estimation under complex grasps and occlusions

Develop a monocular 3D hand pose estimation method that accurately reconstructs finger joint articulation, including strongly flexed and overlapping digits during grasp and transport phases of the Box and Block Test, overcoming the articulation errors and missing-finger predictions observed in current methods such as SAM 3D Body and WiLoR when faced with self-occlusion and complex hand-object interactions.

Background

The paper qualitatively evaluates several monocular pose estimation models (PromptHMR, SMPLer-X, SAM 3D Body for body; WiLoR for hands) on Box and Block Test videos. While SAM 3D Body was selected for its relatively consistent body pose estimates, both SAM 3D Body and WiLoR exhibited noticeable discrepancies in hand articulation.

Specifically, the authors observed that fingers appearing strongly flexed in the image were reconstructed as more extended, and that overlapping digits during complex grasps (e.g., index finger overlapping the middle finger) were not represented in predictions. The authors note these issues are linked to models trained primarily on healthy populations and lacking exposure to complex, occluded grasps common in the Box and Block Test.

Consequently, despite choosing SAM 3D Body for their analysis, the authors explicitly state that resolving these hand-pose limitations in monocular settings remains an open challenge, motivating the need for improved methods that handle occlusion and complex hand-object interactions.

References

It produces body pose estimates that were more consistent with observed motion than those of SMPLer-X and PromptHMR , and while it shares the same limitation in hand pose as WiLoR , this limitation remains an open challenge for monocular hand pose estimation, to our knowledge.

— Enhancing Box and Block Test with Computer Vision for Post-Stroke Upper Extremity Motor Evaluation (2603.29101 - Robinson et al., 31 Mar 2026) in Results — Comparison of 3D Pose Estimation Methods, Model selection

Monocular hand pose estimation under complex grasps and occlusions

Background

References

Related Problems