Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations (2207.06635v5)

Published 14 Jul 2022 in cs.CV

Abstract: Score-based diffusion models (SBDMs) have achieved the SOTA FID results in unpaired image-to-image translation (I2I). However, we notice that existing methods totally ignore the training data in the source domain, leading to sub-optimal solutions for unpaired I2I. To this end, we propose energy-guided stochastic differential equations (EGSDE) that employs an energy function pretrained on both the source and target domains to guide the inference process of a pretrained SDE for realistic and faithful unpaired I2I. Building upon two feature extractors, we carefully design the energy function such that it encourages the transferred image to preserve the domain-independent features and discard domain-specific ones. Further, we provide an alternative explanation of the EGSDE as a product of experts, where each of the three experts (corresponding to the SDE and two feature extractors) solely contributes to faithfulness or realism. Empirically, we compare EGSDE to a large family of baselines on three widely-adopted unpaired I2I tasks under four metrics. EGSDE not only consistently outperforms existing SBDMs-based methods in almost all settings but also achieves the SOTA realism results without harming the faithful performance. Furthermore, EGSDE allows for flexible trade-offs between realism and faithfulness and we improve the realism results further (e.g., FID of 51.04 in Cat to Dog and FID of 50.43 in Wild to Dog on AFHQ) by tuning hyper-parameters. The code is available at https://github.com/ML-GSAI/EGSDE.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Min Zhao (42 papers)
  2. Fan Bao (30 papers)
  3. Chongxuan Li (75 papers)
  4. Jun Zhu (426 papers)
Citations (158)

Summary

  • The paper introduces an energy-guided SDE framework that overcomes the sub-optimal use of source domain images in traditional score-based diffusion models.
  • It employs dual feature extractors to disentangle domain-independent and domain-specific features, enhancing both image realism and fidelity.
  • Empirical results on AFHQ and CelebA-HQ datasets demonstrate state-of-the-art performance with competitive FID scores across various unpaired translation tasks.

Overview of EGSDE for Unpaired Image-to-Image Translation

The paper presents a method called Energy-Guided Stochastic Differential Equations (EGSDE), aimed at enhancing unpaired image-to-image translation (I2I) tasks by introducing a new approach that leverages energy functions. Unpaired I2I tasks focus on transforming images from a source domain to a target domain without paired examples between the two domains. Traditional methods struggled with maintaining realism and faithfulness, particularly when source domain data was underutilized during training.

Introduction to EGSDE

EGSDE innovatively integrates a pretrained energy function across both the source and target domains into the inference process of a pretrained Stochastic Differential Equation (SDE). This strategy addresses the critical issue of sub-optimal utilization of source domain images in existing score-based diffusion models (SBDMs). The paper describes a careful design of the energy function, which is structured to promote the preservation of domain-independent features and the elimination of domain-specific characteristics in the translated image.

Methodological Insights

At the core of EGSDE are two feature extractors: one for domain-independent features and another for domain-specific features. These extractors contribute to an energy function decomposed into two log potential functions, each guiding the translation process. The EGSDE approach enhances both the realism and faithfulness of the translated images by modeling the influence of these features through stochastic processes described by a reverse-time SDE.

Furthermore, the solution to the SDE is related to a product of experts approach, where each of the three components (corresponding to the SDE and the two feature extractors) makes distinct contributions to the translated image's realism or faithfulness. This formulation provides a clear understanding of the impact and role each component plays in the final output.

Empirical Results

The authors demonstrate the efficacy of EGSDE through rigorous experimentation on three unpaired I2I tasks: Cat to Dog, Wild to Dog, and Male to Female translations on the AFHQ and CelebA-HQ datasets. It is noteworthy that EGSDE consistently surpasses existing SBDM-based methods across most evaluation metrics. By adjusting hyperparameters, EGSDE achieves state-of-the-art realism scores, evidenced by highly competitive Fréchet Inception Distance (FID) values.

Practical and Theoretical Implications

The incorporation of energy functions for guiding image translation marks a substantial development in the field of generative modeling, particularly for applications requiring high-fidelity translations without paired datasets. Practically, EGSDE's capability to maintain a flexible balance between realism and faithfulness can be critical in domains like medical imaging and style transfer, where detail preservation is essential.

Theoretically, this work sparks potential developments in the broader adoption of energy-based models for various stochastic sampling tasks in AI. The product of experts framework further provides a robust conceptual path for incorporating multiple discriminative elements directly into generative processes.

Future Directions

The paper opens avenues for future exploration, suggesting possibilities such as more complex domain-independent feature extractors using disentangled representation learning techniques. Additionally, there is potential for further investigation into faster and more efficient sampling methods, given the computational demands demonstrated in unpaired I2I tasks.

In conclusion, EGSDE represents a significant stride in unpaired I2I translations, pushing the boundaries of existing techniques while setting the stage for further innovation in leveraging energy functions within generative models.