Papers
Topics
Authors
Recent
2000 character limit reached

ITVTON: Virtual Try-On Diffusion Transformer Based on Integrated Image and Text

Published 28 Jan 2025 in cs.CV | (2501.16757v2)

Abstract: Virtual try-on task, grounded in persons and garments, has produced notable advancements in the domain of diffusion models. Numerous approaches use replicated backbones or additional image encoders to extract garment features, leading to higher computational cost and a more complex network structure. In this work, we introduce ITVTON, which utilizes the Diffusion Transformer (DiT) as a generator to enhance image quality. ITVTON also improves garment-person interaction by stitching garment and person images along the spatial channel, and integrates textual descriptions from both the garment and person images to further enhance the realism of the generated visuals. This network structure is efficient, and to further reduce computational cost, we constrain training to attention parameters within a single Diffusion Transformer (Single-DiT) block. Extensive experiments demonstrate that ITVTON outperforms baseline methods both qualitatively and quantitatively, thereby establishing a new benchmark for virtual try-on tasks.Additionally, 10,257 image pairs were selected from IGPair to demonstrate that ITVTON performs effectively in realistic scenes.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.