HAIFIT: Human-to-AI Fashion Image Translation
Abstract: In the realm of fashion design, sketches serve as the canvas for expressing an artist's distinctive drawing style and creative vision, capturing intricate details like stroke variations and texture nuances. The advent of sketch-to-image cross-modal translation technology has notably aided designers. However, existing methods often compromise these sketch details during image generation, resulting in images that deviate from the designer's intended concept. This limitation hampers the ability to offer designers a precise preview of the final output. To overcome this challenge, we introduce HAIFIT, a novel approach that transforms sketches into high-fidelity, lifelike clothing images by integrating multi-scale features and capturing extensive feature map dependencies from diverse perspectives. Through extensive qualitative and quantitative evaluations conducted on our self-collected dataset, our method demonstrates superior performance compared to existing methods in generating photorealistic clothing images. Our method excels in preserving the distinctive style and intricate details essential for fashion design applications. In addition, our method also has obvious advantages in model training and inference speed, contributing to reducing designers' time costs and improving design efficiency.
- Unsupervised attention-guided image-to-image translation. Advances in neural information processing systems, 31, 2018.
- Sketch2photo: Internet image montage. ACM transactions on graphics (TOG), 28(5):1–10, 2009.
- Generative adversarial nets. Advances in Neural Information Processing Systems, 27, 2014.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV), pages 172–189, 2018.
- Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
- U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. ICLR, 2020.
- Bbdm: Image-to-image translation with brownian bridge diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1952–1961, 2023.
- Unsupervised sketch to photo synthesis. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pages 36–52. Springer, 2020.
- Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
- Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
- Fashion-gen: The generative fashion dataset and challenge. arXiv preprint arXiv:1806.08317, 2018.
- Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8798–8807, 2018.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
- Styleme: Towards intelligent fashion generation with designer style. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–16, 2023.
- Ufs-net: Unsupervised network for fashion style editing and generation. In 2023 IEEE International Conference on Multimedia and Expo (ICME), pages 2105–2110. IEEE, 2023.
- Sketchscene: Scene sketch to image generation with diffusion models. In 2023 IEEE International Conference on Multimedia and Expo (ICME), pages 2087–2092. IEEE, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
- Toward multimodal image-to-image translation. Advances in neural information processing systems, 30, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.