Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models (2402.01877v1)
Abstract: The growing digital landscape of fashion e-commerce calls for interactive and user-friendly interfaces for virtually trying on clothes. Traditional try-on methods grapple with challenges in adapting to diverse backgrounds, poses, and subjects. While newer methods, utilizing the recent advances of diffusion models, have achieved higher-quality image generation, the human-centered dimensions of mobile interface delivery and privacy concerns remain largely unexplored. We present Mobile Fitting Room, the first on-device diffusion-based virtual try-on system. To address multiple inter-related technical challenges such as high-quality garment placement and model compression for mobile devices, we present a novel technical pipeline and an interface design that enables privacy preservation and user customization. A usage scenario highlights how our tool can provide a seamless, interactive virtual try-on experience for customers and provide a valuable service for fashion e-commerce businesses.
- Apple. 2022. Deploying Transformers on the Apple Neural Engine. https://machinelearning.apple.com/research/neural-engine-transformers
 - Apple. 2023. Optimizing Models - Guide to Core ML Tools. https://apple.github.io/coremltools/docs-guides/source/optimizing-models.html
 - Apple. 2024. Testing Apps with TestFlight. https://testflight.apple.com/
 - Anydoor: Zero-shot object-level image customization. arXiv preprint arXiv:2307.09481 (2023).
 - Stephanie Chevalier. 2023. Distribution of retail website visits and orders worldwide in 1st quarter 2023, by device. https://www.statista.com/statistics/568684/e-commerce-website-visit-and-orders-by-device/
 - DKM: Differentiable K-Means Clustering Layer for Neural Network Compression. arXiv:2108.12659Â [cs.LG]
 - Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14131–14140.
 - A survey of on-device machine learning: An algorithms and learning theory perspective. ACM Transactions on Internet of Things 2, 3 (2021), 1–49.
 - Parser-free virtual try-on via distilling appearance flows. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8485–8493.
 - Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D. arXiv:1701.07204Â [cs.DS]
 - Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022).
 - Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022).
 - Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences. arXiv:2310.04621Â [cs.HC]
 - Personalized Federated Learning With Differential Privacy. IEEE Internet of Things Journal 7, 10 (2020), 9530–9539. https://doi.org/10.1109/JIOT.2020.2991416
 - Composer: Creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778 (2023).
 - Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479 (2018).
 - StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On. arXiv preprint arXiv:2312.01725 (2023).
 - Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13 (2017), 3521–3526.
 - Compiling Tensor Expressions into Einsum. In International Conference on Computational Science. Springer, 129–136.
 - WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on. arXiv preprint arXiv:2312.03667 (2023).
 - Enabling lightweight fine-tuning for pre-trained language model compression based on matrix product operators. arXiv preprint arXiv:2106.02205 (2021).
 - Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11461–11471.
 - Permute, quantize, and fine-tune: Efficient compression of neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15699–15708.
 - T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453 (2023).
 - Stable Diffusion with Core ML on Apple Silicon. https://github.com/apple/ml-stable-diffusion
 - UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild. arXiv preprint arXiv:2305.11147 (2023).
 - Swapnet: Image based garment transfer. In Computer Vision–ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XII 15. Springer, 679–695.
 - Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1, 2 (2022), 3.
 - High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
 - DreamBooth: Fine Tuning Text-to-image Diffusion Models for Subject-Driven Generation. (2022).
 - Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.
 - Statista. 2023. Fashion E-commerce in the United States. https://www.statista.com/topics/3481/fashion-e-commerce-in-the-united-states/#topicOverview
 - Plug-and-play diffusion features for text-driven image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1921–1930.
 - Fine-pruning: Joint fine-tuning and compression of a convolutional network with Bayesian optimization. arXiv preprint arXiv:1707.09102 (2017).
 - Machine Learning Model Sizes and the Parameter Gap. arXiv:2207.02852Â [cs.LG]
 - Smartbrush: Text and shape guided object inpainting with diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22428–22437.
 - Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790 (2023).
 - Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
 - Virtually trying on new clothing with arbitrary poses. In Proceedings of the 27th ACM international conference on multimedia. 266–274.
 - TryOnDiffusion: A Tale of Two UNets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4606–4615.
 
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.