Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data (2404.10177v2)

Published 20 Mar 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data. Both Ambient Diffusion and alternative SURE-based approaches for learning diffusion models from corrupted data resort to approximations which deteriorate performance. We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data, solving an open problem in this space. Our key technical contribution is a method that uses a double application of Tweedie's formula and a consistency loss function that allows us to extend sampling at noise levels below the observed data noise. We also provide further evidence that diffusion models memorize from their training sets by identifying extremely corrupted images that are almost perfectly reconstructed, raising copyright and privacy concerns. Our method for training using corrupted samples can be used to mitigate this problem. We demonstrate this by fine-tuning Stable Diffusion XL to generate samples from a distribution using only noisy samples. Our framework reduces the amount of memorization of the fine-tuning dataset, while maintaining competitive performance.

Authors (3)

Giannis Daras (23 papers)
Alexandros G. Dimakis (133 papers)
Constantinos Daskalakis (111 papers)

Citations (10)

View on Semantic Scholar

Summary

Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data

This paper contributes to the landscape of diffusion models by presenting a novel framework that enables training these models using only noisy data. The proposed approach resolves a notable issue in the domain: the ability to sample from an uncorrupted data distribution when only corrupted data is available for training. This is accomplished through a dual application of Tweedie's formula alongside a consistency loss function, overcoming limitations in existing methods that rely heavily on approximations which degrade performance.

The authors begin by addressing the challenges associated with diffusion models and the risks of memorization—wherein models reproduce training data, raising ethical and privacy concerns. They propose training diffusion models on corrupted datasets as a potential solution, a method that stands to benefit areas where uncorrupted data is scarce or costly to obtain, such as in medical imaging or astro-imaging.

The key technical contributions of this work can be summarized as follows:

Exact Ambient Score Matching Framework: The paper introduces an exact methodology for training diffusion models using only corrupted samples. The process relies on a computationally efficient optimization problem that identifies optimal denoisers across varying noise levels using double Tweedie's formula. Specifically, the framework ensures learning can happen for noise levels $\sigma_t \geq \sigma_n$ , where $\sigma_n$ is inherent in the corrupted data.
Consistency Loss for Lower Noise Levels: To extend the applicability of learned models for noise levels below $\sigma_n$ , the authors incorporate a consistency loss mechanism. This enables the model to learn effectively even when direct access to lower noise data is unavailable, thus facilitating precise sampling from the target distribution.
Addressing and Reducing Memorization: The paper offers empirical evidence of memorization within foundational diffusion models like Stable Diffusion XL, demonstrating that highly corrupted training images can be reconstructed with unexpected clarity, suggesting prior inclusion in the training dataset. By employing the proposed training method, the extent of memorization—and thus potential data leakage—is significantly decreased.

The paper's findings have significant implications both in theory and practice. Theoretically, it pushes forward the boundary of what can be achieved in unsupervised learning with noisy data, potentially redefining how diffusion models are perceived in handling data corruption. Practically, it opens the avenue for broader applications of diffusion models in sensitive domains by addressing crucial issues of data privacy and integrity.

The experimental evaluation powerfully supports these contributions, where models trained using the proposed method demonstrate impressive denoising capabilities at multiple noise tiers. This is further reinforced by the fidelity of generated images, which remain competitive even when trained with high noise levels—a testament to the framework's robustness.

Future work may dive into exploring sparse and variably corrupted datasets more extensively and optimizing computational efficiencies inherent in the proposed method for larger scale applications. Additionally, this work encourages further investigation of how such training paradigms might apply to other forms of generative models beyond diffusion frameworks, potentially impacting broader AI domains.

The open-source code release further positions the research community to iterate upon and extend these foundational results, fostering a deeper understanding of learning from noisy data in AI.