AJAHR: Amputated Joint Aware 3D Mesh Recovery
- The paper introduces an innovative framework that integrates BPAC-Net for limb classification with a dual-tokenizer mechanism to address anatomical diversity.
- It employs a Vision Transformer backbone and SMPL model with zero-encoding for missing limbs, ensuring robust and anatomically faithful mesh reconstruction.
- The system leverages the large-scale A3D synthetic dataset to train and validate performance, achieving lower errors on amputee cases compared to traditional methods.
Amputated Joint Aware 3D Human Mesh Recovery (AJAHR) is an adaptive framework for 3D human pose and mesh reconstruction designed to address anatomical diversity, specifically limb loss. Traditional human mesh recovery models presume canonical body structures and consequently underperform on amputee subjects, exacerbated by limited specialized datasets. AJAHR introduces an architecture that combines an amputation-aware classifier (BPAC-Net), a dual-tokenizer pose estimation strategy, and a large-scale synthetic amputee dataset (A3D), enabling robust mesh recovery in both amputee and non-amputee populations.
1. System Architecture and Core Components
AJAHR consists of a Vision Transformer (ViT) backbone that creates image embeddings used by a Transformer decoder, applying two cross-attention pathways: one for initializing pose tokens (zero-pose token) and another for integrating semantic information (classifier token). The decoder outputs are divided into region-specific branches covering left arm, right arm, left leg, and right leg, each handling region-specific regression (rotation, shape, camera) and amputation classification.
Body-Part Amputation Classifier Integration
BPAC-Net, a dedicated module, conducts limb presence/absence classification. It ingests both RGB images and 2D keypoint heatmaps, encoding features via a ResNet-32 backbone with Convolutional Block Attention Module (CBAM) enhancement. Four classification heads produce binary amputation indicators, which dictate subsequent pose processing. Specifically, BPAC-Net's outputs select between two pre-trained tokenizers: the amputation-aware codebook (trained on both amputee and non-amputee data) and a non-amputee-only codebook. The mesh recovery network then predicts SMPL pose parameters, with absent limb parameters set to a zero matrix, encoding anatomical loss directly in the hierarchical mesh representation.
2. Adaptive Pose Estimation and Training Methodology
AJAHR leverages a dual-tokenizer mechanism conditioned by amputation prediction. The estimated amputation vector from BPAC-Net determines which codebook is used for post-processing:
- For amputation states (), the mesh parameters are recovered using the amputee codebook.
- For full limb presence, the non-amputee codebook is employed.
The SMPL model generates the final mesh, with absent limbs encoded by zeroing corresponding joint and descendant pose parameters, structurally collapsing affected mesh vertices. Training is conducted end-to-end, jointly fitting BPAC-Net and the mesh recovery subnetwork. BPAC-Net is optimized via cross-entropy classification loss on limb presence, while mesh regression uses stable 6D rotations for pose estimates, losses on mesh reconstruction, joint positions (2D and 3D), and shape parameters. The complete objective is:
Cross-attention components facilitate information transfer from BPAC-Net feature maps to the pose decoder, especially when one or more limbs are absent, stabilizing pose estimation in structurally ambiguous cases.
3. Amputee 3D (A3D) Synthetic Dataset Design
To address dataset scarcity, AJAHR incorporates the A3D dataset, composed of more than 1 million synthetic amputee images. Its construction follows a multi-stage synthesis pipeline:
- Human pose data from Human3.6M, MPII, and MSCOCO are processed by ScoreHMR for SMPL parameter inference.
- An index selection module sets amputated joint pose parameters to zero matrices, thereby structurally removing limbs in the mesh.
- Visual assets from BEDLAM provide realistic skin and clothing, with demographic balancing for ethnicity and gender.
- Clean backgrounds are generated using human segmentation (SAM) and image inpainting (LaMa).
Each mesh is fully annotated with SMPL parameters, 2D/3D joint coordinates, and explicit amputation labels. This process synthesizes a diversity of limb-loss types (missing hand, forearm, full arm, ankle, knee, whole leg, etc.), ensuring coverage of structurally absent joint cases. Augmentation with A3D improves model generalization to real in-the-wild amputee images by providing supervised data on anatomically missing regions.
4. Quantitative Evaluation and Comparative Results
Evaluation of AJAHR is conducted on amputee datasets (A3D, ITW-amputee) and standard non-amputee datasets (EMDB, 3DPW). Metrics include Mean Vertex Error (MVE), Mean Per Joint Position Error (MPJPE), and Procrustes-Aligned MPJPE (PA-MPJPE). AJAHR consistently achieves lower errors on amputee datasets when compared to TokenHMR, HMR2.0, and BEDLAM-CLIFF, clearly reducing mesh hallucination and misinterpretation of missing limbs. The amputation-aware mechanism (BPAC-Net + conditional tokenizer selection) produces more anatomically faithful reconstructions.
On non-amputee datasets, AJAHR maintains competitive accuracy, evidencing that adaptivity for amputee cases does not degrade canonical pose recovery. Tabulated performance reveals distinct advances over prior works in both amputee and mixed-population settings.
Method | Amputee MVE | Amputee MPJPE | Non-amputee MPJPE |
---|---|---|---|
TokenHMR | Higher | Higher | Competitive |
BEDLAM-CLIFF | Higher | Higher | Competitive |
AJAHR | Lower | Lower | Competitive |
This table summarizes the relative metric performance as described in the original results.
5. Critical Technical Details
BPAC-Net determines limb presence/absence for each body part via
(Equation 1)
The tokenizer switching mechanism operates as:
- If , then
- Otherwise,
(Equation 2)
Tokenizers follow a VQ-VAE scheme with codebooks quantizing latent pose representations; losses include reconstruction (), codebook (), and commitment ():
(Equation 3)
The system's use of the SMPL model enables amputation encoding (by zeroing hierarchical joint parameters), resulting in mesh collapse of relevant regions, which is crucial for avoiding hallucinated limb predictions. BPAC-Net’s ResNet-32 backbone with CBAM provides spatial and channel-level feature attention, robustly integrating RGB and keypoint modalities.
6. Prospects for Extension and Broader Applications
Current support is limited to joint-aligned amputations consistent with the SMPL kinematic hierarchy. The framework's authors anticipate extensions for prosthetic integration and irregular amputation geometries, such as partial or non-joint-aligned losses. Applications in sports analytics for Paralympic athletes, inclusive AR/VR systems, and human–computer interface technologies are envisaged, broadening the societal and technological impact of anatomically inclusive mesh recovery.
Efforts to improve robustness may involve incorporation of real-world amputee annotated data, advanced generative synthesis methods, and refinement of BPAC-Net to reduce ambiguity between occlusion and true amputation cues. These directions are likely to further increase the fidelity and realism of pose and mesh reconstruction for anatomically diverse subjects.
7. Summary
AJAHR advances 3D human mesh recovery via explicit modeling of limb amputation, introducing a classifier-guided dual-tokenizer architecture and a comprehensive synthetic amputee dataset. Technical innovations—hierarchical zero-encoding, semantic codebook switching, and joint classifier-pose training—yield state-of-the-art results on both amputee and non-amputee datasets. This framework provides a foundational paradigm for anatomically adaptive human mesh recovery, significantly enhancing model inclusivity and performance in underrepresented populations (Cho et al., 24 Sep 2025).