Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 152 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 119 tok/s Pro
Kimi K2 197 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models (2402.01877v1)

Published 2 Feb 2024 in cs.HC, cs.AI, and cs.LG

Abstract: The growing digital landscape of fashion e-commerce calls for interactive and user-friendly interfaces for virtually trying on clothes. Traditional try-on methods grapple with challenges in adapting to diverse backgrounds, poses, and subjects. While newer methods, utilizing the recent advances of diffusion models, have achieved higher-quality image generation, the human-centered dimensions of mobile interface delivery and privacy concerns remain largely unexplored. We present Mobile Fitting Room, the first on-device diffusion-based virtual try-on system. To address multiple inter-related technical challenges such as high-quality garment placement and model compression for mobile devices, we present a novel technical pipeline and an interface design that enables privacy preservation and user customization. A usage scenario highlights how our tool can provide a seamless, interactive virtual try-on experience for customers and provide a valuable service for fashion e-commerce businesses.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Apple. 2022. Deploying Transformers on the Apple Neural Engine. https://machinelearning.apple.com/research/neural-engine-transformers
  2. Apple. 2023. Optimizing Models - Guide to Core ML Tools. https://apple.github.io/coremltools/docs-guides/source/optimizing-models.html
  3. Apple. 2024. Testing Apps with TestFlight. https://testflight.apple.com/
  4. Anydoor: Zero-shot object-level image customization. arXiv preprint arXiv:2307.09481 (2023).
  5. Stephanie Chevalier. 2023. Distribution of retail website visits and orders worldwide in 1st quarter 2023, by device. https://www.statista.com/statistics/568684/e-commerce-website-visit-and-orders-by-device/
  6. DKM: Differentiable K-Means Clustering Layer for Neural Network Compression. arXiv:2108.12659 [cs.LG]
  7. Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14131–14140.
  8. A survey of on-device machine learning: An algorithms and learning theory perspective. ACM Transactions on Internet of Things 2, 3 (2021), 1–49.
  9. Parser-free virtual try-on via distilling appearance flows. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8485–8493.
  10. Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D. arXiv:1701.07204 [cs.DS]
  11. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022).
  12. Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022).
  13. Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences. arXiv:2310.04621 [cs.HC]
  14. Personalized Federated Learning With Differential Privacy. IEEE Internet of Things Journal 7, 10 (2020), 9530–9539. https://doi.org/10.1109/JIOT.2020.2991416
  15. Composer: Creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778 (2023).
  16. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479 (2018).
  17. StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On. arXiv preprint arXiv:2312.01725 (2023).
  18. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13 (2017), 3521–3526.
  19. Compiling Tensor Expressions into Einsum. In International Conference on Computational Science. Springer, 129–136.
  20. WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on. arXiv preprint arXiv:2312.03667 (2023).
  21. Enabling lightweight fine-tuning for pre-trained language model compression based on matrix product operators. arXiv preprint arXiv:2106.02205 (2021).
  22. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11461–11471.
  23. Permute, quantize, and fine-tune: Efficient compression of neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15699–15708.
  24. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453 (2023).
  25. Stable Diffusion with Core ML on Apple Silicon. https://github.com/apple/ml-stable-diffusion
  26. UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild. arXiv preprint arXiv:2305.11147 (2023).
  27. Swapnet: Image based garment transfer. In Computer Vision–ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XII 15. Springer, 679–695.
  28. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1, 2 (2022), 3.
  29. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  30. DreamBooth: Fine Tuning Text-to-image Diffusion Models for Subject-Driven Generation. (2022).
  31. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.
  32. Statista. 2023. Fashion E-commerce in the United States. https://www.statista.com/topics/3481/fashion-e-commerce-in-the-united-states/#topicOverview
  33. Plug-and-play diffusion features for text-driven image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1921–1930.
  34. Fine-pruning: Joint fine-tuning and compression of a convolutional network with Bayesian optimization. arXiv preprint arXiv:1707.09102 (2017).
  35. Machine Learning Model Sizes and the Parameter Gap. arXiv:2207.02852 [cs.LG]
  36. Smartbrush: Text and shape guided object inpainting with diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22428–22437.
  37. Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790 (2023).
  38. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
  39. Virtually trying on new clothing with arbitrary poses. In Proceedings of the 27th ACM international conference on multimedia. 266–274.
  40. TryOnDiffusion: A Tale of Two UNets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4606–4615.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com