DAMA: Disentangled Body-Anchored Gaussians for Controllable Multi-Layered Avatars

Published 20 May 2026 in cs.CV | (2605.21001v1)

Abstract: Existing 3D clothed avatar reconstruction methods achieve high visual fidelity but ignore geometric structure and physical plausibility. They either model clothed humans as a single deformable surface or attempt garment disentanglement without enforcing geometric constraints, resulting in ambiguous garment boundaries and no control over stacking or layer ordering. To address these limitations, we introduce DAMA (Disentangled body-Anchored Gaussians for Controllable Multi-layered Avatars), a 3D avatar reconstruction method that produces physically plausible clothed avatars through a dedicated representation and reconstruction method. At the representation level, we bind Gaussians to SMPL-X faces using barycentric in-plane coordinates and a positive normal offset. Based on this parameterization, the reconstruction method lifts 2D segmentations to body-anchored Gaussians, refines layers using topology-guided correction, and jointly optimizes geometry and appearance. DAMA is the first Gaussian avatar reconstruction method from multi-view images to achieve physically plausible layering, clean garment separation, and explicit stacking control. On the full 4D-DRESS dataset (82 scans), it achieves state-of-the-art performance in geometry reconstruction, garment separation, penetration rate, and penetration depth. The representation further supports user-defined garment reordering and fast conversion of body-conforming garments to simulation-ready meshes. Project Page: https://danieleskandar.github.io/dama/

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a novel avatar reconstruction method using body-anchored Gaussians and positive normal offsets to ensure intersection-free, physically plausible garment layering.
It employs a pipeline that integrates multi-view segmentation lifting, topology-aware label refinement, and independent layer optimization for robust geometric accuracy and semantic control.
Experimental results on the 4D-DRESS dataset demonstrate state-of-the-art performance, notably reducing garment-body intersections and producing simulation-ready mesh extraction.

DAMA: Physically Plausible Layered Avatars with Disentangled Body-Anchored Gaussians

Introduction

The DAMA framework proposes a novel method for the reconstruction of multi-layered, physically plausible human avatars from multi-view RGB imagery and corresponding segmentation masks. Unlike prior approaches that focus solely on visual fidelity or poseable geometry, DAMA introduces a representation and reconstruction pipeline that enforces geometric and physical plausibility by design—specifically targeting clean garment separation and explicit stacking order. DAMA’s key innovation lies in the anchoring of anisotropic Gaussians to SMPL-X faces via barycentric coordinates and strictly positive normal offsets, enabling intersection-free garments, semantic identity preservation under deformation, and user-defined garment stacking and reordering.

Existing neural avatar models—including volumetric radiance fields and Gaussian splatting avatars—prioritize appearance modeling and generic pose control, while neglecting explicit garment decomposition and hierarchical layering [kerbl3Dgaussians, huang20242dgs]. Some disentanglement works, such as GALA [kim2024gala] and Disco4D [pang2025disco4d], attempt garment isolation via 2D mask lifting and optimization-based intersection penalties. These approaches, however, induce artifacts such as ambiguous segmentation boundaries, interpenetrating layers, and fail to encode layer ordering as an explicit geometric constraint.

Template-driven garment separation [pons2017clothcap, bhatnagar2019multi] achieves multi-layered disentanglement but is restricted by fixed templates and high-quality 3D input requirement. Recent Gaussian Wardrobe [chen2026gaussianwardrobe] permits stacking but locks layer order during training and resolves collisions post hoc in image space, prohibiting garment reordering at inference.

DAMA aims to overcome these limitations via direct geometric encoding: each Gaussian is bound to a mesh face and placed with a strictly positive normal offset, thus enforcing garment separation, surface adherence, and intersection avoidance. Topology-aware label refinement further guarantees robust segmentation, especially in self-occluded regions.

Methodology

Anchored Gaussian Representation

DAMA anchors each Gaussian to a specific SMPL-X face using barycentric coordinates within the local face and a positive normal offset $\delta_i^l > 0$ , ensuring that all garment layer Gaussians lie exterior to the body and preceding layers. This formulation prevents lateral drift, maintains semantic assignment through articulations, and eliminates interpenetration.

The Gaussian mean for layer $l$ is given by:

$\boldsymbol{\mu}_i^l = \sum_{k=1}^3 b_{ik}^l \mathbf{v}_k + \delta_i^l \sum_{k=1}^3 b_{ik}^l \mathbf{n}_k$

where $b_{ik}^l$ are barycentric coordinates and $\delta_i^l > 0$ is the normal offset.

Rotation is determined as a relative quaternion to the SMPL-X face orientation, and scales and opacities are set to capture locally isotropic geometry and visibility across the mesh.

Reconstruction Pipeline

Segmentation Lifting: Multi-view image masks are lifted to SMPL-X-anchored Gaussians, providing initial geometry and label assignments. Label smoothness, scale, and normal alignment losses are used to stabilize optimization.
Topology-Aware Refinement: Labels are projected onto the mesh’s topology, and connected label regions below an area threshold are relabeled via majority vote among adjacent faces. This step rectifies occlusion- and segmentation-induced errors.
Fine Geometry and Appearance Optimization: Each semantic layer is extracted, initialized with multiple Gaussians per face, and optimized independently for geometry and masked appearance supervision. Layer-level losses provide robust regularization and prevent color/geometry leakage.
Layer Composition and Animation: Layers are merged per user-defined order, with Gaussian means updated according to underlying stacking. The construction naturally supports SMPL-X pose animation by re-evaluating Gaussian parameters per posed mesh.

Applications

Garment Transfer and Stacking: Garments are transferred between avatars, utilizing underlying SMPL-X correspondence. Collisions are resolved by stacking offsets, and transferred garments are locally refined for appearance consistency.
Simulation-Ready Mesh Extraction: Gaussian layers yield meshes by averaging Gaussian means incident to each SMPL-X vertex; the mesh topology and ordering ensure intersection-free, physically consistent simulation input.
Hair Transfer and Layer Reordering: The representation extends to hair and shoes, facilitating layer-wise transfer, stacking, and user-defined reordering.

Quantitative and Qualitative Evaluation

On the 4D-DRESS dataset [wang20244ddress], DAMA achieves state-of-the-art performance in geometric accuracy and physical plausibility:

Method	Chamfer Dist. (mm) ↓	Penetration Rate (%) ↓	Penetration Depth (mm) ↓	PSNR ↑	LPIPS ↓
GALA	28.78	36.48	28.43	32.43	0.025
Disco4D	28.86	49.08	18.20	32.81	0.040
2DGS	21.88	51.83	8.91	35.21	0.031
DAMA	19.88	1.46	0.32	30.15	0.035

The representation imposes near-zero garment–body intersections in both full avatars and individual garment layers (upper, lower, outer), outperforming prior work by more than an order of magnitude. Segmentation and topology refinement prevent floating or mislabeled geometry. Appearance metrics (PSNR, LPIPS) remain competitive given the strict geometric constraints imposed.

Ablation studies demonstrate that positive normal offsets are essential for intersection-free layering, and topology-based label refinement is vital for segmentation accuracy. Loss ablations indicate the importance of isotropic scale and canonical alignment for robust geometry in occluded regions and under animation.

Implications and Future Directions

Practically, DAMA enables careful control over garment manipulation—including stacking, transfer, and conversion to simulation-ready geometry—all with explicit layer ordering. Theoretically, the parameterization advances the geometric encoding of avatars, improving physical plausibility and semantic stability under arbitrary deformations.

For generative models and virtual try-on scenarios, DAMA’s explicit stacking and garment ordering address the lack of compositional control seen in diffusion-based or radiance field avatars. This representation could integrate with generative pipelines to produce physically consistent avatars from text or monocular images.

Future research should aim to handle loose or non-conforming garments, incorporate temporal garment deformation from video sequences, and address layer-wise geometry for dynamic simulation, potentially integrating explicit physics in the reconstruction loop.

Conclusion

DAMA establishes a new paradigm for physically plausible, controllable, multi-layered human avatars via SMPL-X-anchored Gaussian representations. The method achieves clean garment disentanglement, explicit stacking, and intersection-free layering, validated by strong quantitative evidence and robust applications. This approach unlocks new avenues in avatar reconstruction, simulation, and controllable generative modeling for virtual environments and digital apparel workflows (2605.21001).

Markdown Report Issue