Magic Clothing: Controllable Garment-Driven Image Synthesis (2404.09512v2)

Published 15 Apr 2024 in cs.CV

Abstract: We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task. Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllability is the most critical issue, i.e., to preserve the garment details and maintain faithfulness to the text prompts. To this end, we introduce a garment extractor to capture the detailed garment features, and employ self-attention fusion to incorporate them into the pretrained LDMs, ensuring that the garment details remain unchanged on the target character. Then, we leverage the joint classifier-free guidance to balance the control of garment features and text prompts over the generated results. Meanwhile, the proposed garment extractor is a plug-in module applicable to various finetuned LDMs, and it can be combined with other extensions like ControlNet and IP-Adapter to enhance the diversity and controllability of the generated characters. Furthermore, we design Matched-Points-LPIPS (MP-LPIPS), a robust metric for evaluating the consistency of the target image to the source garment. Extensive experiments demonstrate that our Magic Clothing achieves state-of-the-art results under various conditional controls for garment-driven image synthesis. Our source code is available at https://github.com/ShineChen1024/MagicClothing.

References (56)

Citations (8)

View on Semantic Scholar

Summary

The paper’s main contribution is a garment extractor module that integrates fine-grained apparel features into latent diffusion denoising via self-attention fusion.
It introduces joint classifier-free guidance to balance text prompt fidelity with precise garment detail preservation, resulting in state-of-the-art image synthesis.
The plug-in compatibility of Magic Clothing enables seamless integration with various finetuned LDMs, enhancing versatility in e-commerce and virtual environment applications.

Magic Clothing: An Advanced Approach for Garment-Driven Image Synthesis Using Latent Diffusion Models

Introduction

The paper presents Magic Clothing, a novel approach leveraging Latent Diffusion Models (LDMs) for garment-driven image synthesis—a challenging task with considerable potential in fields such as e-commerce and virtual environments. Unlike traditional subject-driven image synthesis that primarily focuses on general conditions like pose or appearance, garment-driven synthesis demands a fine-grained detail preservation of apparel within generated images, adhering to specific text prompts. Addressing this, the paper introduces a garment extractor that efficiently integrates detailed garment features into the denoising process of LDMs through self-attention fusion. Furthermore, it proposes the use of joint classifier-free guidance to enhance control over the synthesis process, balancing between text prompt fidelity and garment feature preservation.

Methodology

Garment Extractor and Fusion Mechanism:

The paper's core contribution lies in its garment extractor module designed to capture and incorporate detailed garment features into the synthesis process. Utilizing the architecture of UNet, the garment extractor ensures the seamless integration of these details into the LDM's denoising process via self-attention layers.
This method ensures the fine-grained features of the garment are preserved in the generated character images, allowing for high fidelity to both the input garment images and associated text prompts.

Joint Classifier-free Guidance (CFG):

To address the challenge of maintaining balance between garment detail fidelity and adherence to text prompts, the paper proposes a sophisticated joint CFG technique, enhancing the traditional classifier-free guidance approach.
By simultaneously considering text and garment features with a joint distribution during training, the model effectively harmonizes these elements, leading to improved synthesis quality.

Plug-in Module Compatibility and Extensibility

A significant advantage of Magic Clothing is its compatibility as a plug-in module. This design choice allows it to be combined with various finetuned LDMs and extensions such as ControlNet and IP-Adapter, enabling additional control over factors like style, pose, and facial features. This extensibility showcases the method's versatility in generating diverse character images under varied conditional controls without compromising garment detail quality.

Experiments and Results

Comprehensive evaluations demonstrate Magic Clothing's superior performance over existing methods in garment-driven image synthesis. Through qualitative and quantitative metrics, including the novel Matched-Points-LPIPS (MP-LPIPS) metric designed for this task, the paper highlights its method's ability to produce state-of-the-art results under various conditional controls.

MP-LPIPS Metric:

The introduction of the MP-LPIPS metric addresses the need for a robust evaluation standard for garment consistency in generated images, emphasizing its effectiveness in capturing garment fidelity without being unduly influenced by unrelated factors like pose or background.

Conclusion and Future Directions

Magic Clothing's approach to garment-driven image synthesis marks a significant advancement in the field, particularly in addressing the nuanced demand for detail preservation in synthesized images. Its plug-in compatibility further enhances its application potential, promising exciting developments in personalized virtual dressing and avatar creation. Looking forward, refining this model with more extensive training datasets and exploring its integration with more advanced LDMs could unlock even higher-quality synthesis capabilities, broadening its utility across various domains.

Related Papers

GitHub

GitHub - ShineChen1024/MagicClothing: Official implementation of Magic Clothing: Controllable Garment-Driven Image Synthesis (1,129 stars)

Tweets

https://twitter.com/taziku_co/status/1780559565182103920

https://twitter.com/hashqueu/status/1782732515675078745

https://twitter.com/CSVisionPapers/status/1780762547701907895

https://twitter.com/kurtqian/status/1781539285776376066

YouTube

Show All Videos