Garment Transfer and Stacking

Updated 1 June 2026

Garment Transfer and Stacking is a domain focused on methods for moving garments between digital avatars or robots and arranging layered attire with high geometric and physical plausibility.
Techniques such as mesh-based layering, implicit surface reconstruction, Gaussian splats, and autoregressive video latents enable accurate garment transfer and realistic stacking in simulation and robotics.
Robotic and simulation approaches like DexGarmentLab and HALO demonstrate effective garment manipulation with quantifiable metrics, advancing virtual try-on and autonomous textile handling.

Garment transfer and stacking refer to the computational, algorithmic, and robotic techniques for transferring garments between digital avatars or physical agents and arranging (stacking) multiple garments on a single human model, robot, or video sequence. This domain underpins a wide range of applications in virtual try-on, character animation, simulation, and autonomous manipulation in textile robotics. Leading research addresses both the geometric, appearance, and physical plausibility of layered clothing as well as efficient, interactive manipulation in simulation and real-world environments.

1. Mathematical Representations and Model Architectures

Digital garment transfer leverages structured body models, parametric templates, and implicit/explicit representations to reconstruct and manipulate layered apparel.

Mesh-based Layering: Multi-Garment Network (MGN) employs the SMPL body model $M(\theta, \beta, D)$ with blend shapes and learned per-garment deformations. Each garment $g$ is parameterized in T-pose using a PCA basis $B^g$ and a high-frequency residual $D^{\mathrm{hf},g}$ , reconstructed by $G^g = B^g z^g + D^{\mathrm{hf},g}$ and posed via skinning $G^g(\theta, \beta) = W(T^g(\beta), J(\beta), \theta, W)$ (Bhatnagar et al., 2019).
Implicit Surface Reconstruction: DI-Net introduces a pixel-aligned implicit function mapping features $F(x)$ and per-point depth to occupancy and color. Layer stacking is achieved by mask-based decomposition and implicit color assignment to visible surfaces only, controlled by occupancy thresholds and implicit ray-marching (Zhong et al., 2023).
Gaussian Splat Representations: DAMA parameterizes each garment layer as collections of Gaussians anchored to SMPL-X faces, $(b,\delta,q_r)$ , where $b$ is the face-barycentric coordinate, $\delta$ is a strictly positive normal offset, and $g$ 0 is a relative rotation. This ensures each garment remains strictly above the prior surface and enables explicit, intersection-free stacking (Eskandar et al., 20 May 2026).
Autoregressive Video Latents: FashionChameleon models video-based garment transfer and stacking using concatenated VAE-encoded latents for reference, garment, and video frames, orchestrating transformer-based multi-modal attention over chunked video sequences (Song et al., 15 May 2026).

2. Algorithms for Garment Transfer

Garment transfer tasks require mapping a garment's geometry and/or appearance from a source (image, 3D scan, or video) to a new target (subject, pose, or context):

Feature Correspondence and Warping: DI-Net's Complementary Warping Module learns dense correspondences between source and target using cosine-similarity matrices and softmax-weighted warping, complemented with sparse flow-based patch sampling to align high-frequency garment detail with pose-conditioned geometry (Zhong et al., 2023). This enables the faithful transfer of individual garment regions and their recomposition in target scenes.
Per-Garment Anchoring and Reparameterization: DAMA encodes each garment's geometry as a function of the underlying SMPL-X surface. Garment transfer proceeds by extracting barycentric and normal-offset parameters from the source and applying them to the corresponding faces/normals on the target mesh, with adjustment for stack order to guarantee intersection-free placement (Eskandar et al., 20 May 2026).
Semantic Retargeting: MGN leverages learned garment templates in a shared parametric basis, predicting garment parameters from image features and associating them with the target body/pose. Each garment can be directly reposed or transferred between body models, supporting arbitrary dressing sequences (Bhatnagar et al., 2019).
Interactive Video and Latent Switching: FashionChameleon enforces garment transfer in an autoregressive frame sequence by replacing garment-key/value pairs in the KV cache, rescheduling conditional attention, and maintaining pose and appearance consistency during dynamic, streaming garment switching (Song et al., 15 May 2026).

3. Techniques and Constraints for Garment Stacking

Stacking multiple garments raises challenges in separation, ordering, and occlusion handling:

Explicit Layer Control: DAMA supports a user-defined stack order $g$ 1; each layer is offset along the surface normal by $g$ 2, iteratively pushing new garments strictly outside all lower layers. API call \texttt{reorder_layers} ensures any permutation of the stack is possible without intersection (Eskandar et al., 20 May 2026).
Semantic Mask Composition: In DI-Net, region masks from warped parsing maps allow arbitrary garment and limb assignments to stack or reorder visual garment layers, while volume-based occupancy fields ensure only visible points are assigned surface color. Occlusion resolution is performed by ray-marching in the implicit field, supporting free-form topology and multiple source garments (Zhong et al., 2023).
Hierarchical Mesh Aggregation: MGN concatenates posed garment meshes above the body model into a single vertex array, using registration losses to avoid interpenetration and relying on per-vertex offsets so that each garment "hovers" just above prior layers (Bhatnagar et al., 2019).
Autoregressive Dynamic Stacking: In FashionChameleon, garment stacking/switching during video generation is reduced to a latent cache manipulation: garment KV refresh, history KV zero-out, and pose-consistent reference KV recomputation. Each new garment or combo is realized at interactive latency without retraining (Song et al., 15 May 2026).

4. Robotic and Physical Manipulation of Garments

Dexterous robotic garment manipulation, including transfer (handover) and stacking (placement), is addressed by simulation environments and hierarchical policies:

DexGarmentLab Environment: Dual-arm/dual-hand robotic agents, simulated via Isaac Sim with Position-Based Dynamics (PBD) for large garments and Finite-Element Method (FEM) for small elastic items, perform specified folding, transferring, and stacking protocols with refined adhesion and friction models (Wang et al., 16 May 2025).
Hierarchical Policies for Manipulation: The HALO framework comprises a Garment Affordance Model (GAM) for transferable grasp point prediction (via contrastive PointNet++ features and InfoNCE loss) and a Structure-Aware Diffusion Policy (SADP) for generating bimanual manipulation trajectories, conditioned on state, affordance, and environment encodings (Wang et al., 16 May 2025).
Task Definitions and Metrics: Typical transfer tasks include Fold Tops, Fold Dress, and Fold Trousers, each consisting of phased pick–handover–regrasp cycles. Stacking (Store Tops) involves a preparatory fold and coordinated placement at a platform center, with success measured by the projected centroid within 0.1 m of the target (Wang et al., 16 May 2025).
Performance: HALO outperforms prior bimanual policies, achieving 0.85 ± 0.05 success for Fold Tops and 0.80 ± 0.02 for Store Tops in simulation; real-world results show 13/15 success for tops folding compared to 9/15 for the DP baseline (Wang et al., 16 May 2025).

5. Quantitative Evaluation and Comparison

Rigorous evaluation across geometry, appearance, and manipulation is critical.

Method / Dataset	Geometry (Chamfer, mm)	Penetration Rate (%)	Layering/Stacking Control
DAMA (4D-DRESS, avatar, upper)	23.02	0.56	Explicit reorderable stack (Eskandar et al., 20 May 2026)
GALA (4D-DRESS, upper)	23.97	29.95	No explicit stack ordering
Disco4D (4D-DRESS, upper)	40.31	45.20	No explicit stack ordering
DI-Net (MGN, UI)	SSIM 0.9714	—	Arbitrary mask-based layering (Zhong et al., 2023)
MGN (3D scan)	5.78 (8 views, body+garments)	—	Mesh stack, limited template (Bhatnagar et al., 2019)
HALO (DexGarmentLab, sim folds)	—	—	Handover and platform stacking (Wang et al., 16 May 2025)

SSIM, FID, and LPIPS are used for image/mesh fidelity; DAMA additionally reports average penetration depths (0.30–0.32 mm for upper garments).

6. Limitations, Open Challenges, and Future Directions

Several open challenges constrain current systems:

Physical Plausibility and Dynamics: Although DAMA and MGN constrain or regularize penetration and facilitate stacking, realistic modeling of dynamic deformations, wrinkles, and cloth–cloth/cloth–body contact remains limited. MGN lacks explicit simulation and is limited to static skins (Bhatnagar et al., 2019), while DAMA enforces a strict "outside" constraint but operates primarily per-scene (Eskandar et al., 20 May 2026).
Template and Topology Extensibility: MGN is restricted to five garment classes and cannot generalize to arbitrary cloth types or complex multipart stacking (e.g. capes, aprons, multi-jacket layers) without substantial template library expansion and more sophisticated blend-shape or nonrigid deformation models (Bhatnagar et al., 2019).
Cross-Domain Consistency: DI-Net can compositely stack garments from arbitrary references, but relies on learned parsing and occupancy thresholds that may introduce artifacts with highly noncontiguous or novel garments (Zhong et al., 2023).
Robust Real-World Manipulation: Although DexGarmentLab and HALO close the sim-to-real gap with improved PBD/FEM and large-scale trajectory synthesis, folding and stacking success in the real world depends on the granularity of affordance detection, task randomization, and physical grasp/contact modeling (Wang et al., 16 May 2025).
Interactive Video Stacking: FashionChameleon uniquely enables real-time stacking and switching of garments in video streams, but its consistency for complex interactions or physically implausible layerings is bounded by the priors learned from single-garment training pairs (Song et al., 15 May 2026).

Further directions may include dynamic garment physics integration, universal mesh and Gaussian layer reparameterization across large shape topologies, and reinforcement learning policies for robust bimanual stacking and transfer under unconstrained deformations.

Key References: