Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 177 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 119 tok/s Pro
Kimi K2 203 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

AJAHR: Amputated Joint Aware 3D Mesh Recovery

Updated 26 September 2025
  • The paper introduces an innovative framework that integrates BPAC-Net for limb classification with a dual-tokenizer mechanism to address anatomical diversity.
  • It employs a Vision Transformer backbone and SMPL model with zero-encoding for missing limbs, ensuring robust and anatomically faithful mesh reconstruction.
  • The system leverages the large-scale A3D synthetic dataset to train and validate performance, achieving lower errors on amputee cases compared to traditional methods.

Amputated Joint Aware 3D Human Mesh Recovery (AJAHR) is an adaptive framework for 3D human pose and mesh reconstruction designed to address anatomical diversity, specifically limb loss. Traditional human mesh recovery models presume canonical body structures and consequently underperform on amputee subjects, exacerbated by limited specialized datasets. AJAHR introduces an architecture that combines an amputation-aware classifier (BPAC-Net), a dual-tokenizer pose estimation strategy, and a large-scale synthetic amputee dataset (A3D), enabling robust mesh recovery in both amputee and non-amputee populations.

1. System Architecture and Core Components

AJAHR consists of a Vision Transformer (ViT) backbone that creates image embeddings used by a Transformer decoder, applying two cross-attention pathways: one for initializing pose tokens (zero-pose token) and another for integrating semantic information (classifier token). The decoder outputs are divided into region-specific branches covering left arm, right arm, left leg, and right leg, each handling region-specific regression (rotation, shape, camera) and amputation classification.

Body-Part Amputation Classifier Integration

BPAC-Net, a dedicated module, conducts limb presence/absence classification. It ingests both RGB images and 2D keypoint heatmaps, encoding features via a ResNet-32 backbone with Convolutional Block Attention Module (CBAM) enhancement. Four classification heads produce binary amputation indicators, which dictate subsequent pose processing. Specifically, BPAC-Net's outputs select between two pre-trained tokenizers: the amputation-aware codebook (trained on both amputee and non-amputee data) and a non-amputee-only codebook. The mesh recovery network then predicts SMPL pose parameters, with absent limb parameters set to a zero matrix, encoding anatomical loss directly in the hierarchical mesh representation.

2. Adaptive Pose Estimation and Training Methodology

AJAHR leverages a dual-tokenizer mechanism conditioned by amputation prediction. The estimated amputation vector y^\hat{y} from BPAC-Net determines which codebook is used for post-processing:

  • For amputation states (y^1>0\|\hat{y}\|_1 > 0), the mesh parameters are recovered using the amputee codebook.
  • For full limb presence, the non-amputee codebook is employed.

The SMPL model generates the final mesh, with absent limbs encoded by zeroing corresponding joint and descendant pose parameters, structurally collapsing affected mesh vertices. Training is conducted end-to-end, jointly fitting BPAC-Net and the mesh recovery subnetwork. BPAC-Net is optimized via cross-entropy classification loss on limb presence, while mesh regression uses stable 6D rotations for pose estimates, 2\ell_2 losses on mesh reconstruction, joint positions (2D and 3D), and shape parameters. The complete objective is:

Loverall=λθLθ(θ,θ^)+λβLβ(β,β^)+λ2DL2D(J2D,J^2D)+λ3DL3D(J3D,J^3D)+λclsLclsL_{overall} = \lambda_\theta L_\theta(\theta, \hat{\theta}) + \lambda_\beta L_\beta(\beta, \hat{\beta}) + \lambda_{2D} L_{2D}(J_{2D}, \hat{J}_{2D}) + \lambda_{3D} L_{3D}(J_{3D}, \hat{J}_{3D}) + \lambda_{cls} L_{cls}

Cross-attention components facilitate information transfer from BPAC-Net feature maps to the pose decoder, especially when one or more limbs are absent, stabilizing pose estimation in structurally ambiguous cases.

3. Amputee 3D (A3D) Synthetic Dataset Design

To address dataset scarcity, AJAHR incorporates the A3D dataset, composed of more than 1 million synthetic amputee images. Its construction follows a multi-stage synthesis pipeline:

  1. Human pose data from Human3.6M, MPII, and MSCOCO are processed by ScoreHMR for SMPL parameter inference.
  2. An index selection module sets amputated joint pose parameters to zero matrices, thereby structurally removing limbs in the mesh.
  3. Visual assets from BEDLAM provide realistic skin and clothing, with demographic balancing for ethnicity and gender.
  4. Clean backgrounds are generated using human segmentation (SAM) and image inpainting (LaMa).

Each mesh is fully annotated with SMPL parameters, 2D/3D joint coordinates, and explicit amputation labels. This process synthesizes a diversity of limb-loss types (missing hand, forearm, full arm, ankle, knee, whole leg, etc.), ensuring coverage of structurally absent joint cases. Augmentation with A3D improves model generalization to real in-the-wild amputee images by providing supervised data on anatomically missing regions.

4. Quantitative Evaluation and Comparative Results

Evaluation of AJAHR is conducted on amputee datasets (A3D, ITW-amputee) and standard non-amputee datasets (EMDB, 3DPW). Metrics include Mean Vertex Error (MVE), Mean Per Joint Position Error (MPJPE), and Procrustes-Aligned MPJPE (PA-MPJPE). AJAHR consistently achieves lower errors on amputee datasets when compared to TokenHMR, HMR2.0, and BEDLAM-CLIFF, clearly reducing mesh hallucination and misinterpretation of missing limbs. The amputation-aware mechanism (BPAC-Net + conditional tokenizer selection) produces more anatomically faithful reconstructions.

On non-amputee datasets, AJAHR maintains competitive accuracy, evidencing that adaptivity for amputee cases does not degrade canonical pose recovery. Tabulated performance reveals distinct advances over prior works in both amputee and mixed-population settings.

Method Amputee MVE Amputee MPJPE Non-amputee MPJPE
TokenHMR Higher Higher Competitive
BEDLAM-CLIFF Higher Higher Competitive
AJAHR Lower Lower Competitive

This table summarizes the relative metric performance as described in the original results.

5. Critical Technical Details

BPAC-Net determines limb presence/absence for each body part pp via

y^p={0if argmax(hp)=0 1otherwise\hat{y}_p = \begin{cases} 0 & \text{if } \arg\max(h_p) = 0 \ 1 & \text{otherwise} \end{cases}

(Equation 1)

The tokenizer switching mechanism operates as:

  • If y^1>0\|\hat{y}\|_1 > 0, then θ^=σ(T)×Camp\hat{\theta} = \sigma(T) \times C_{amp}
  • Otherwise, θ^=σ(T)×Cnon_amp\hat{\theta} = \sigma(T) \times C_{non\_amp}

(Equation 2)

Tokenizers follow a VQ-VAE scheme with codebooks quantizing latent pose representations; losses include reconstruction (LmixL_{mix}), codebook (sg[Z]Z~2||sg[Z]-\tilde{Z}||^2), and commitment (Zsg[Z~]2||Z-sg[\tilde{Z}]||^2):

Ltotal=λmixLmix+λcbsg[Z]Z~2+λcomZsg[Z~]2L_{total} = \lambda_{mix} L_{mix} + \lambda_{cb} ||sg[Z] - \tilde{Z}||^2 + \lambda_{com} ||Z - sg[\tilde{Z}]||^2

(Equation 3)

The system's use of the SMPL model enables amputation encoding (by zeroing hierarchical joint parameters), resulting in mesh collapse of relevant regions, which is crucial for avoiding hallucinated limb predictions. BPAC-Net’s ResNet-32 backbone with CBAM provides spatial and channel-level feature attention, robustly integrating RGB and keypoint modalities.

6. Prospects for Extension and Broader Applications

Current support is limited to joint-aligned amputations consistent with the SMPL kinematic hierarchy. The framework's authors anticipate extensions for prosthetic integration and irregular amputation geometries, such as partial or non-joint-aligned losses. Applications in sports analytics for Paralympic athletes, inclusive AR/VR systems, and human–computer interface technologies are envisaged, broadening the societal and technological impact of anatomically inclusive mesh recovery.

Efforts to improve robustness may involve incorporation of real-world amputee annotated data, advanced generative synthesis methods, and refinement of BPAC-Net to reduce ambiguity between occlusion and true amputation cues. These directions are likely to further increase the fidelity and realism of pose and mesh reconstruction for anatomically diverse subjects.

7. Summary

AJAHR advances 3D human mesh recovery via explicit modeling of limb amputation, introducing a classifier-guided dual-tokenizer architecture and a comprehensive synthetic amputee dataset. Technical innovations—hierarchical zero-encoding, semantic codebook switching, and joint classifier-pose training—yield state-of-the-art results on both amputee and non-amputee datasets. This framework provides a foundational paradigm for anatomically adaptive human mesh recovery, significantly enhancing model inclusivity and performance in underrepresented populations (Cho et al., 24 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Amputated Joint Aware 3D Human Mesh Recovery (AJAHR).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube