DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models (2505.06166v1)

Published 9 May 2025 in cs.CV

Abstract: We address the task of generating 3D hair geometry from a single image, which is challenging due to the diversity of hairstyles and the lack of paired image-to-3D hair data. Previous methods are primarily trained on synthetic data and cope with the limited amount of such data by using low-dimensional intermediate representations, such as guide strands and scalp-level embeddings, that require post-processing to decode, upsample, and add realism. These approaches fail to reconstruct detailed hair, struggle with curly hair, or are limited to handling only a few hairstyles. To overcome these limitations, we propose DiffLocks, a novel framework that enables detailed reconstruction of a wide variety of hairstyles directly from a single image. First, we address the lack of 3D hair data by automating the creation of the largest synthetic hair dataset to date, containing 40K hairstyles. Second, we leverage the synthetic hair dataset to learn an image-conditioned diffusion-transfomer model that generates accurate 3D strands from a single frontal image. By using a pretrained image backbone, our method generalizes to in-the-wild images despite being trained only on synthetic data. Our diffusion model predicts a scalp texture map in which any point in the map contains the latent code for an individual hair strand. These codes are directly decoded to 3D strands without post-processing techniques. Representing individual strands, instead of guide strands, enables the transformer to model the detailed spatial structure of complex hairstyles. With this, DiffLocks can recover highly curled hair, like afro hairstyles, from a single image for the first time. Data and code is available at https://radualexandru.github.io/difflocks/

Summary

DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models

This paper presents a novel approach to the challenging task of 3D hair reconstruction from a single image, addressing limitations in existing methods and datasets. The proposed framework, termed DiffLocks, directly generates intricate 3D hair geometry by utilizing diffusion models conditioned on realistic RGB data.

Challenges in 3D Hair Reconstruction

3D hair reconstruction remains a significant challenge primarily due to the complex geometry and variety of hairstyles. Traditional methods rely heavily on synthetic datasets but face limitations due to scarce data and the need for intermediate representations, which often result in low-detail reconstruction. Moreover, existing solutions struggle with complex hairstyles like curly hair or afro-like styles and require extensive post-processing to improve realism. The paper identifies the core problem of paired 3D hair data scarcity and difficulty in handling diverse hairstyles.

DiffLocks Framework

DiffLocks innovatively automates the creation of a new, extensive synthetic dataset comprising 40,000 hairstyles, significantly expanding the variety and realism available for model training. This dataset overcomes the barriers associated with previous methods that relied on limited manually-created synthetic data. Additionally, this allows the framework to utilize sophisticated diffusion models, specifically Hourglass Diffusion Transformers, for generating detailed 3D strands from an RGB image. It leverages image-conditioned diffusion-transfomer architecture, which processes images and generates detailed strands without intermediate decoders or post-processing, addressing the problem of high-fidelity hair reconstruction directly.

Results and Implications

Quantitative evaluations show marked improvements over existing methods with DiffLocks achieving superior precision and recall metrics across several synthetic and real-world datasets. DiffLocks generates realistic and diverse hairstyles rapidly, presenting the opportunity to integrate these results directly into real-time rendering engines like Unreal Engine without further adjustments, which speaks to its potential applicability in media, gaming, and entertainment.

The use of extensive synthetic data and diffusion models introduces a powerful prior for 3D hair that simplifies the complexity involved in rendering a vast range of hairstyles. The framework’s ability to robustly generalize to in-the-wild images, trained solely on synthetic data, marks an important advancement in addressing the diversity of hair types encountered in practical applications.

Future Directions

Future work should explore the inclusion of hair elements like braids, accessories, or dynamically changing styles such as beards and eyebrows, which remain outside the current dataset’s scope. Expanding the dataset to cover such details could provide broader applicability and sophistication in modeling digital humans. Additionally, further research might focus on optimizing hair rendering in more constrained computing environments typical in gaming and interactive applications.

In conclusion, DiffLocks represents a substantial step forward in automatic 3D hair reconstruction from images, leveraging diffusion models and synthetic data to overcome long-standing limitations in fidelity and diversity. It provides a robust framework for practical implementation in real-time systems and presents potential pathways for future research in automated 3D modeling.

Related Papers

GitHub

DiffLocks