SMPLitex: 3D Human Texture Diffusion

Updated 10 March 2026

SMPLitex is a generative diffusion framework that estimates and edits complete 3D human textures from a single image using SMPL and DensePose mapping.
It employs a streamlined, one-shot inpainting pipeline, leveraging latent diffusion and a curated dataset to support text-driven and structural UV texture manipulation.
Experimental results demonstrate superior SSIM and LPIPS performance compared to state-of-the-art methods, ensuring high-fidelity texture synthesis.

SMPLitex is a generative diffusion-based framework and dataset for complete 3D human texture estimation and manipulation from a single image. It integrates recent advancements in latent diffusion modeling and 3D body modeling via SMPL, enabling high-fidelity synthesis, editing, and inpainting of UV-mapped textures directly associated with the SMPL mesh topology. SMPLitex introduces a methodologically streamlined pipeline and a curated dataset of high-quality 3D human textures, with evaluations demonstrating substantial improvement over prior state-of-the-art in both cross-view reconstruction accuracy and support for diverse text-driven and structural editing tasks (Casas et al., 2023).

1. Model Architecture and Integration with SMPL

The SMPLitex system adopts a latent diffusion model (LDM) backbone as described by Rombach et al. (2022), omitting adversarial components such as GAN discriminators and relying entirely on the diffusion paradigm for both training and inference. Within this architecture, a frozen variational autoencoder encoder $\mathcal{E}$ projects 2D (or UV) images into a spatial latent variable $z \in \mathbb{R}^{h \times w \times c}$ . While exact dimensions are not stated, reference to Stable Diffusion v1-4 implies a downsampling of $512 \times 512$ images to a $64 \times 64 \times 4$ latent representation.

The generative process is orchestrated by a time-conditional U-Net $\epsilon_\theta$ operating over $(z_t, c, t)$ , where $z_t$ is the noisy latent, $c$ is the context encoding (textual description and/or partial UV observation), and $t$ is the diffusion timestep.

SMPL integration is achieved by leveraging the SMPL parametric body model $M(\theta, \beta)$ (pose and shape), combined with DensePose correspondences $d(p)$ for pixel-to-UV mapping and a silhouette mask $s(p)$ to extract visible subject regions. The partial UV map $u_{\text{part}}$ is then assembled by projecting masked image pixels into UV space:

$u_{\text{part}} = \Pi(x, d \odot s)$

where $\odot$ denotes element-wise product and $\Pi$ distributes image colors into corresponding UV bins.

The generative diffusion is guided by this $u_{\text{part}}$ (either concatenated or through cross-attention) as a conditioning signal, directing inpainting of missing texels to complete the full SMPL UV texture.

2. Training Objectives and Optimization

SMPLitex optimization follows the standard LDM denoising loss, specifically the $L_2$ noise prediction objective:

$\mathcal{L}_{\text{diff}} = \mathbb{E}_{z_t\sim q(z_t|z_0),\,t,\,c,\,\epsilon\sim\mathcal{N}(0,1)} \left[ \|\epsilon - \epsilon_\theta(z_t,c,t)\|_2^2 \right]$

No adversarial, explicit $L_1/L_2$ reconstruction, perceptual (VGG), or additional regularization losses are reported beyond this denoising term. For transfer from a general image LDM to UV-mapped textures, the authors supplement training with the “prior-preservation” loss as in DreamBooth (Ruiz et al. 2023), mitigating catastrophic forgetting, although its formula is not explicitly provided.

3. Dataset Composition and Sampling Methodology

The fine-tuning phase uses a dataset of 10 high-quality UV texture maps from earlier SMPL reconstruction studies (Alldieck et al. 2018; Lazova et al. 2019). These serve as targets to steer the pretrained latent diffusion backbone toward learning SMPL-style UV parametrizations.

The resulting SMPLitex dataset comprises 100 curated and diversified human UV textures, each generated via classifier-free diffusion sampling (guidance scale 2.0, 50 denoising steps) and paired with textual prompts describing specific clothing, accessory, and identity attributes. All textures are standardized to $512\times512$ UV resolution. No explicit closed-form data distribution is specified, implying sampling is performed in a prompt-driven, controlled-but-diverse fashion.

4. One-Shot 3D Texture Fitting Pipeline

Given a single input image $x$ , the pipeline proceeds as follows:

Detect a 2D person, estimate SMPL pose $\theta$ and shape $\beta$ .
Compute DensePose correspondences $d(p)$ for visible pixels and generate a binary silhouette mask $s(p)$ .
Assemble the partial UV map: $u_{\text{part}} = \Pi(x, d \odot s)$ .
Condition the LDM inpainting model on $u_{\text{part}}$ to sample completions for missing or occluded texels:

Sample $z_T \sim \mathcal{N}(0, I)$ ,
Iteratively denoise:

$z_{t-1} \leftarrow p_\theta(z_{t-1} \mid z_t,\, u_{\text{part}},\, t)$
Decode the final latent with the VAE decoder $D$ :

$u_{\text{full}} = D(z_0)$

This procedure constitutes a pure feed-forward inpainting process using LDM sampling; no optimization or iterative backpropagation over inputs is employed at inference.

5. Experimental Results and Quantitative Performance

SMPLitex’s performance is evaluated across three benchmarks: Market-1501 (cross-view re-rendering, $64\times128$ images), THUman2.0 (multi-view render-from-scan), and DeepFashion-MultiModal (qualitative assessment on $750 \times 1101$ images).

Dataset	Metric	TexFormer	CMR	HPBTT	RSTG	TexGlo	SMPLitex
Market-1501	SSIM	0.7422	0.7142	0.7420	0.6735	0.6658	0.8648 (+0.12)
	LPIPS (↓)	0.1154	0.1275	0.1168	0.1778	0.1776	0.0695 (–0.0459)
THUman2.0	SSIM	0.8761	–	–	–	–	0.8829 (+0.0068)
	LPIPS (↓)	0.1223	–	–	–	–	0.1067 (–0.0156)

In Market-1501, SMPLitex surpasses all baselines by 0.12 in SSIM and reduces the perceptual similarity error (LPIPS) by –0.0459 versus the strongest comparator. On THUman2.0, gains relative to TexFormer are similarly observed. DeepFashion-MultiModal evaluation highlights superior recovery of high-frequency details (e.g., garment wrinkles, facial anisotropy) not matched by GAN-based methods.

Qualitative examination reveals artifact-free novel view synthesis, plausible completions in occluded regions, and sharp attribute transfer across diverse poses and body shapes.

6. Applications and Capabilities

SMPLitex enables a range of editing and synthesis tasks:

Partial-mask inpainting: Arbitrary UV regions in $u_{\text{part}}$ can be replaced and inpainted, supporting localized editing such as color changes or logo addition.
Novel-view synthesis: Completed $u_{\text{full}}$ textures mapped onto $M(\theta, \beta)$ are rendered from arbitrary viewpoints, consistently across pose sequences due to the SMPL UV parameterization.
Text-driven attribute manipulation: The LDM backbone allows sampling conditioned on text prompts, supporting synthesis of textures with user-defined clothing, appearance, or accessories. Interpolation in CLIP-embedding or diffusion latent space facilitates smooth morphing between attribute sets.
Latent-space editing: For prompt embeddings $e_1$ , $e_2$ (text prompts $t_1$ , $t_2$ ), forming $e_\alpha = (1-\alpha)e_1 + \alpha e_2$ enables synthesis of intermediate textures with interpolated attributes.

7. Contributions and Release

SMPLitex’s principal contributions are:

A fine-tuned latent diffusion backbone that natively generates high-fidelity, fully-differentiable SMPL UV textures.
A one-shot single-image fitting pipeline requiring only DensePose correspondences and silhouette masking, allowing high-quality texture completion for previously unseen subjects.
Public release of both 100 curated SMPL-mapped textures (standardized to $512\times512$ UV format) and the trained diffusion model, thus providing reusable assets for downstream research in human texture synthesis, text-driven editing, and 3D avatar creation (Casas et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

SMPLitex: A Generative Model and Dataset for 3D Human Texture Estimation from Single Image (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SMPLitex.

SMPLitex: 3D Human Texture Diffusion

1. Model Architecture and Integration with SMPL

2. Training Objectives and Optimization

3. Dataset Composition and Sampling Methodology

4. One-Shot 3D Texture Fitting Pipeline

5. Experimental Results and Quantitative Performance

6. Applications and Capabilities

7. Contributions and Release

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

SMPLitex: 3D Human Texture Diffusion

1. Model Architecture and Integration with SMPL

2. Training Objectives and Optimization

3. Dataset Composition and Sampling Methodology

4. One-Shot 3D Texture Fitting Pipeline

5. Experimental Results and Quantitative Performance

6. Applications and Capabilities

7. Contributions and Release

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research