Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ViViD: Video Virtual Try-on using Diffusion Models (2405.11794v2)

Published 20 May 2024 in cs.CV

Abstract: Video virtual try-on aims to transfer a clothing item onto the video of a target person. Directly applying the technique of image-based try-on to the video domain in a frame-wise manner will cause temporal-inconsistent outcomes while previous video-based try-on solutions can only generate low visual quality and blurring results. In this work, we present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Specifically, we design the Garment Encoder to extract fine-grained clothing semantic features, guiding the model to capture garment details and inject them into the target video through the proposed attention feature fusion mechanism. To ensure spatial-temporal consistency, we introduce a lightweight Pose Encoder to encode pose signals, enabling the model to learn the interactions between clothing and human posture and insert hierarchical Temporal Modules into the text-to-image stable diffusion model for more coherent and lifelike video synthesis. Furthermore, we collect a new dataset, which is the largest, with the most diverse types of garments and the highest resolution for the task of video virtual try-on to date. Extensive experiments demonstrate that our approach is able to yield satisfactory video try-on results. The dataset, codes, and weights will be publicly available. Project page: https://becauseimbatman0.github.io/ViViD.

Citations (3)

Summary

  • The paper introduces a novel video virtual try-on framework that uses diffusion models to generate realistic garment motion and appearance.
  • It demonstrates improved temporal consistency and visual quality, outperforming traditional GAN-based methods in critical metrics.
  • The approach has significant implications for e-commerce and fashion tech by enabling interactive, high-fidelity digital try-on experiences.

Overview of "Formatting Instructions For NeurIPS 2023"

The document titled "Formatting Instructions For NeurIPS 2023" is a guideline essential for authors preparing submissions for the NeurIPS 2023 conference. The paper outlines necessary formatting and submission procedures that are integral to maintaining the consistency and professional appearance of conference papers. Its utility in the domain of academic conferences is underscored by its detailed prescriptive content, which covers formatting style, citation practices, and the logistic aspects of paper submission.

Key Specifications

The document is structured to provide authors with comprehensive instructions, covering various formatting aspects, all of which adhere to a standard ensuring coherence across submissions:

  1. Page Limits and Style: The paper mandates a maximum length of nine pages for content, including figures, with additional pages permissible only for references and acknowledgments. The use of the NeurIPS \LaTeX{} style files is obligatory, which the guidelines specify must be current and unaltered.
  2. Formatting: Specifics include a 10-point font size with a vertical spacing of 11 points, Times New Roman as the preferred typeface, and established margins. The emphasis on formatting rigor across aspects such as headings and paragraph styling reflects a commitment to uniformity and readability.
  3. Submission Instructions: The submission process is delineated with instructions on using the OpenReview platform, specifically highlighting the separate submission of checklists. Noteworthy is the requirement for submissions to be anonymized, a critical point for maintaining the integrity of the double-blind review process.
  4. Use of Figures and Tables: The document provides guidelines for the presentation of figures and tables, advocating for the use of clear, legible representation without the use of vertical lines in tables, and recommending the \verb+booktabs+ package for professional-quality result tables.
  5. Citations and References: The document prescribes the consistent use of citations using the \verb+natbib+ package and emphasizes the requirement for internal consistency in citation formats within the text.

Implications and Future Developments

This paper has significant implications for the NeurIPS community, ensuring that all submissions are evaluated on their quality rather than presentation discrepancies. The standardized formatting facilitates efficient and consistent peer-review processes and allows for a smooth transition to conference proceedings.

From a practical standpoint, adherence to these guidelines aids authors in presenting their research succinctly, and respect for the specifications contributes to the seamless integration of conference materials into archival systems. The expectations set forth for authors regarding figures, tables, and citations also underline the importance of accuracy and clarity in scientific communication.

Given the rapid advancements in AI research, and the growing volume of submissions to international conferences such as NeurIPS, these formatting instructions are likely to evolve. Future revisions might incorporate feedback from past conferences, technological advancements in document preparation, and increased automation in template adherence checks. These developments could streamline the submission process further, reducing the burden on authors while maintaining the rigorous standards necessary for high-quality scientific discourse.