Bring Metric Functions into Diffusion Models (2401.02414v1)

Published 4 Jan 2024 in cs.CV

Abstract: We introduce a Cascaded Diffusion Model (Cas-DM) that improves a Denoising Diffusion Probabilistic Model (DDPM) by effectively incorporating additional metric functions in training. Metric functions such as the LPIPS loss have been proven highly effective in consistency models derived from the score matching. However, for the diffusion counterparts, the methodology and efficacy of adding extra metric functions remain unclear. One major challenge is the mismatch between the noise predicted by a DDPM at each step and the desired clean image that the metric function works well on. To address this problem, we propose Cas-DM, a network architecture that cascades two network modules to effectively apply metric functions to the diffusion model training. The first module, similar to a standard DDPM, learns to predict the added noise and is unaffected by the metric function. The second cascaded module learns to predict the clean image, thereby facilitating the metric function computation. Experiment results show that the proposed diffusion model backbone enables the effective use of the LPIPS loss, leading to state-of-the-art image quality (FID, sFID, IS) on various established benchmarks.

References (58)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a Cascaded Diffusion Model (Cas-DM) that integrates LPIPS loss into DDPM to improve image generation.
It employs a dual-module architecture that separates noise prediction from clean image estimation, enabling effective metric function integration.
Experiments on CIFAR10, CelebAHQ, and ImageNet reveal improved FID, sFID, and Inception Scores, highlighting its superior performance.

Introduction

The arena of visual content generation has seen a remarkable advancement with the advent of Denoising Diffusion Probabilistic Model (DDPM). DDPM stands as a robust method that employs an iterative process to generate images. A notable recent development in this field is the investigation into utilizing metric functions, specifically the Learned Perceptual Image Patch Similarity (LPIPS) loss, to enhance image generation quality. Despite showing effectiveness in consistency models, the integration of LPIPS into DDPM has remained a complex challenge, given the model's iterative nature which generates noise predictions at each step, contrasting with the metric function's need for a "clean" image.

Cascaded Diffusion Model (Cas-DM)

Addressing the integration dilemma, a novel approach named the Cascaded Diffusion Model (Cas-DM) is introduced. The Cas-DM architecture efficiently blends metric functions into the diffusion model training while maintaining the integrity of the noise prediction process. Built on a two-module network, the Cas-DM allows the first to operate similarly to standard DDPM, predicting added noise, while the second module makes a refined prediction of the clean image, suitable for the metric function's computation. This separation and cascade of tasks are central to the Cas-DM's ability to employ metric functions without disrupting the original DDPM capabilities.

Experimentation and Results

The effectiveness of integrating LPIPS loss in Cas-DM is substantiated through rigorous experiments across several benchmarks, such as CIFAR10, CelebAHQ, and ImageNet, with improved performance indicators like FID, sFID, and Inception Score. The research reveals that Cas-DM with the application of LPIPS loss not only holds up as a successful extension of DDPM but also leads the field by delivering superior image quality in comparison to previous models.

Implications for Future Work

The findings of this paper open avenues for further exploration into model architecture optimizations and the potential for various metric functions to boost diffusion model training. The adaptability of Cas-DM across different image resolutions and dataset complexities, coupled with its ability to incorporate perceptually relevant metrics, establishes a new frontier in generative model development and applications.