$\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States (2303.18242v2)

Published 31 Mar 2023 in cs.LG and cs.CV

Abstract: This paper introduces $\infty$-Diff, a generative diffusion model defined in an infinite-dimensional Hilbert space, which can model infinite resolution data. By training on randomly sampled subsets of coordinates and denoising content only at those locations, we learn a continuous function for arbitrary resolution sampling. Unlike prior neural field-based infinite-dimensional models, which use point-wise functions requiring latent compression, our method employs non-local integral operators to map between Hilbert spaces, allowing spatial context aggregation. This is achieved with an efficient multi-scale function-space architecture that operates directly on raw sparse coordinates, coupled with a mollified diffusion process that smooths out irregularities. Through experiments on high-resolution datasets, we found that even at an $8\times$ subsampling rate, our model retains high-quality diffusion. This leads to significant run-time and memory savings, delivers samples with lower FID scores, and scales beyond the training resolution while retaining detail.

PDF Abstract

An Overview of $\infty$ -Diff: Infinite Resolution Diffusion with Subsampled Mollified States

The paper introduces $\infty$ -Diff, an innovative generative diffusion model operating in an infinite-dimensional Hilbert space. This model's primary claim is its ability to handle infinite resolution data, which is achieved through subsampling and mollification techniques that add stability to the diffusion process in high-dimensional spaces. Such advancements position $\infty$ -Diff as a significant step forward in generative modeling, particularly in scenarios requiring high resolution outputs beyond the capabilities of finite-dimensional models.

Key Contributions and Methodology

The $\infty$ -Diff model is presented as an extension of denoising diffusion probabilistic models (DDPMs) to infinite dimensions. The traditional DDPMs operate by gradually adding Gaussian noise to a dataset and then learning a reverse process that approximates the data distribution. $\infty$ -Diff leverages a Hilbert space framework to extend this concept, allowing the model to utilize and generate data at arbitrarily high resolutions.

Integration of Infinite Dimensional Hilbert Spaces: By effectuating the diffusion process in an infinite-dimensional space, $\infty$ -Diff can theoretically model data at any resolution. This broader state space is managed using non-local integral operators, differentiating it from earlier models based on neural fields that depend on point-wise function evaluations and latent space compression.
Subsampling and Mollification: The model samples random subsets of coordinates to train, an approach that not only provides computational gains but also allows the function-space architecture to focus denoising efforts more effectively on data points. A Gaussian mollifier kernel ensures that states in the diffusion process remain regularized and lie within the space, preventing instability that could arise with raw data.
Neural Operator-Based Architecture: The application of neural operators affords $\infty$ -Diff the ability to efficiently map input-output relationships in high-dimensional function spaces. This is an improvement over models that rely on convolutional neural networks, which are often constrained by fixed-grid assumptions.
Efficient Multi-Scale Handling: The architecture of the model incorporates both sparse coordinate operations for localized details and a grid-based architecture for global context, maintaining a balance between resolution independence and practical memory/computational efficiency.

Experimental Validation and Results

In empirical tests, $\infty$ -Diff exhibits compelling performance across multiple high-resolution datasets, including CelebA-HQ, FFHQ, and LSUN Church. One of the striking results is that even with an $8\times$ subsampling rate, $\infty$ -Diff outputs samples with low Fréchet Inception Distance (FID) scores, indicating high quality while also achieving computational efficiency. The model’s capability for continuous resolution scaling is demonstrated through its consistent performance across varied resolutions, a testament to its design focused on discretization invariance.

Implications and Future Directions

The advancements in $\infty$ -Diff hold practical implications for fields relying on scalable and highly detailed generative models, such as computer graphics, virtual reality, and high-resolution image synthesis. Theoretically, the work sets a precedent for exploring further applications of infinite-dimensional frameworks in machine learning, which could encompass areas outside image generation, like audio processing or larger scale simulations.

As indicated in the discussion, integrating more sophisticated neural operators, improving the efficiency of sparse computation, and leveraging recent advancements in diffusion models could enhance the scalability and quality of infinite-dimensional generative models. Additionally, the potential for incorporating adaptive strategies during the backward diffusion process remains an exciting avenue for optimizing and refining these models.

In conclusion, $\infty$ -Diff represents a noteworthy evolution in the capability and application of diffusion models, successfully pushing the boundary toward effectively utilizing infinite-dimensional spaces for generative tasks. This work not only provides a robust blueprint for future research but also highlights the practical benefits of marrying theoretical advancements with empirical rigor in machine learning models.