Impact of Data Corruption on Diffusion Models: Empirical and Theoretical Insights
Slight Corruption in Pre-training Data Makes Better Diffusion Models
The paper addresses an essential aspect of diffusion models (DMs): the impact of data corruption during the pre-training phase. DMs have demonstrated incredible potential in generating high-quality images, audio, and video content. These models, however, depend heavily on large-scale data harvested from the web, which often contains noisy, inaccurate, and corrupted data pairs.
Empirical Evaluations and Findings
The authors present a thorough empirical paper to understand how slight corruption in pre-training data affects DMs. By intentionally introducing synthetic corruptions to ImageNet-1K (IN-1K) and CC3M datasets, the research investigates more than 50 conditional DMs. The results are counterintuitive and significant: slight corruption (up to 7.5%) can enhance the generated content's quality, diversity, and fidelity compared to models trained exclusively on clean data.
Key Findings
- Enhanced Quality: Models pre-trained with slightly corrupted data achieve lower Fréchet Inception Distance (FID) and higher Inception Score (IS) and CLIP scores.
- Increased Diversity: Corrupted models show higher entropy, indicating a more diverse sample distribution. The Relative Mahalanobis Distance (RMD) score also highlights higher image complexity and diversity.
- Downstream Personalization: Models influenced by slight data corruption during pre-training perform better in downstream tasks, like those involving ControlNet and T2I-Adapter personalization on IN-100 datasets.
Theoretical Analysis
In addition to empirical findings, the paper offers a theoretical framework based on Gaussian mixture models to further substantiate its claims. The authors present two crucial theorems:
- Generation Diversity: Theorem 1 demonstrates that slight corruption increases the entropy of generated distributions (zT) compared to clean conditions, resulting in greater diversity in the generated images.
- Generation Quality: Theorem 2 shows that slight corruption decreases the $2$-Wasserstein distance between the generated and real data distributions, leading to higher quality generated content.
Methodology: Conditional Embedding Perturbation (CEP)
Inspired by the empirical and theoretical findings, the authors propose a novel method termed Conditional Embedding Perturbation (CEP). CEP introduces perturbations in the conditional embeddings during training:
- Implementation: CEP modifies the DM training objective by adding either uniform or Gaussian noise to the conditional embeddings.
- Performance: CEP significantly outperforms methods like Input Perturbations (IP), leading to better conditional diffusion models across various tasks and metrics.
Results with CEP
Training DMs using CEP manifests substantial improvements in both pre-training performance and downstream personalization:
- Pre-training: Results on IN-1K and CC3M indicate that CEP improves FID, IS, and Precision-Recall metrics compared to baseline models.
- Personalization: When applied to personalization tasks (e.g., with ControlNet), CEP enhances model performance, producing more reliable and visually appealing images in downstream applications.
Practical and Theoretical Implications
This paper has several significant implications:
- Practical: Given the unavoidable presence of data corruption in large-scale datasets, incorporating CEP during pre-training can enhance DM performance without the need for perfect data.
- Theoretical: The findings advocate a re-examination of the conventional wisdom that clean data always yield the best models. Slight corruption can serve as an implicit regularization, preventing overfitting.
Future Directions
The research opens various future avenues, including:
- Expansion to Other Modalities: Extending the findings to audio and video diffusion models.
- Robustness in Real-world Data: Applications in domain-specific scenarios like autonomous driving and healthcare, where data corruption is prevalent but high-quality performance is critical.
- Adapting Theoretical Models: Refining theoretical models to better capture the nuanced behavior of DMs under data corruption.
Conclusion
The paper challenges the conventional belief that data corruption necessarily detracts from model performance. Instead, it reveals that slight data corruption during pre-training can have beneficial effects, improving the generalization capability and diversity of diffusion models. The Conditional Embedding Perturbation (CEP) technique proposed by the authors offers a straightforward yet effective approach to harness this phenomenon, leading to better-performance diffusion models for a broad array of applications. This foundational work prompts a paradigm shift that may influence future research and practical implementations in diffusion modeling and beyond.