Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity (2503.16418v1)

Published 20 Mar 2025 in cs.CV and cs.LG

Abstract: Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce InfiniteYou (InfU), one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.

Summary

InfiniteYou: Identity-Preserving Image Generation Using Diffusion Transformers

The paper entitled "InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity" presents an advanced framework for identity-preserving image generation using Diffusion Transformers (DiTs). This framework, named InfiniteYou (InfU), aims to address significant challenges in existing methods, including low identity similarity, weak text-image alignment, and compromised generation quality. The unique approach capitalizes on state-of-the-art Diffusion Transformers like FLUX to enhance image synthesis capabilities while maintaining robust identity features.

Core Contributions

  1. InfuseNet Architecture: Central to the InfU framework is the InfuseNet component, which introduces identity features into the DiT base model using residual connections. This architecture preserves identity similarity more effectively than conventional methods that typically alter attention layers. InfuseNet functions as a generalization of ControlNet, specifically designed to separate text and identity inputs to avoid entanglement.
  2. Multi-Stage Training Strategy: To further enhance model performance, the authors implement a multi-stage training strategy comprising pretraining and supervised fine-tuning (SFT). This methodology utilizes synthetic single-person-multiple-sample (SPMS) data to improve text-image alignment and overall generation quality, ultimately rectifying issues like face copy-pasting.
  3. Compatibility and Plug-and-Play Design: InfU features a plug-and-play design, ensuring compatibility with various existing methods. This design contributes to its adaptability and usefulness across diverse scenarios, promoting integration with existing image generation models and plugins.

Empirical Findings and Implications

The paper reports extensive experiments highlighting InfU’s superiority over existing baselines, such as PuLID-FLUX and FLUX.1-dev IP-Adapters. Key metrics used for evaluation include ID Loss, CLIPScore, and PickScore, with InfU achieving notable improvements in identity similarity, text-image correlation, and image aesthetics.

The plug-and-play characteristic of InfU greatly enhances its practical utility. Its ability to operate smoothly with FLUX variants and its compatibility with ControlNets, LoRAs, and IP-Adapters underpin its versatility in real-world applications. Such adaptability is not only technically appealing but also offers significant contributions to the community by facilitating broader usages and further advancements in the field.

Future Directions

While the results demonstrate InfU’s robust performance, future developments may aim to enhance its scalability and efficiency further. Additionally, exploring its applications beyond traditional portrait generation tasks, such as in avatars and virtual environments, might expand its scope. The framework's design principles can also inform future models that integrate identity preservation with newer generative techniques.

Conclusion

InfiniteYou represents a significant step forward in identity-preserved image generation, leveraging the capabilities of Diffusion Transformers to exceed previous limitations. The introduction of InfuseNet and a sophisticated training regimen underscores the potential of DiTs in enhancing personalized content creation. As applications broaden and technology evolves, frameworks like InfU pave the way for more sophisticated, identity-sensitive image generation.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 5 tweets and received 63 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com