Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

WaveGAN: Frequency-aware GAN for High-Fidelity Few-shot Image Generation (2207.07288v2)

Published 15 Jul 2022 in cs.CV and eess.IV

Abstract: Existing few-shot image generation approaches typically employ fusion-based strategies, either on the image or the feature level, to produce new images. However, previous approaches struggle to synthesize high-frequency signals with fine details, deteriorating the synthesis quality. To address this, we propose WaveGAN, a frequency-aware model for few-shot image generation. Concretely, we disentangle encoded features into multiple frequency components and perform low-frequency skip connections to preserve outline and structural information. Then we alleviate the generator's struggles of synthesizing fine details by employing high-frequency skip connections, thus providing informative frequency information to the generator. Moreover, we utilize a frequency L1-loss on the generated and real images to further impede frequency information loss. Extensive experiments demonstrate the effectiveness and advancement of our method on three datasets. Noticeably, we achieve new state-of-the-art with FID 42.17, LPIPS 0.3868, FID 30.35, LPIPS 0.5076, and FID 4.96, LPIPS 0.3822 respectively on Flower, Animal Faces, and VGGFace. GitHub: https://github.com/kobeshegu/ECCV2022_WaveGAN

Citations (42)

Summary

  • The paper introduces WaveGAN, a GAN architecture that uses frequency decomposition and skip connections with a frequency L1-loss to preserve both low-frequency structure and high-frequency details in few-shot image generation.
  • Experimental results show WaveGAN achieves superior FID and LPIPS scores on few-shot image generation tasks compared to existing methods, indicating enhanced output image quality and diversity.
  • WaveGAN demonstrates the practical viability of using multi-frequency analysis for synthesizing high-fidelity images from minimal data samples, suggesting new avenues for research in generative modeling.

WaveGAN: A Frequency-Aware Approach for Few-Shot Image Generation

The paper introduces WaveGAN, a novel model aimed at enhancing few-shot image generation through a frequency-aware methodology. The issue within existing generative adversarial networks (GANs) is the difficulty in reproducing high-frequency signals that are crucial for fine detail, especially when limited data is provided.

WaveGAN's central innovation is its decomposition of encoded features into distinct frequency components, augmented by the strategic application of low-frequency and high-frequency skip connections. This architecture facilitates the preservation of structural information via low-frequency components while enhancing detail synthesis with high-frequency components. It employs a frequency L1L_1-loss to mitigate the loss of frequency information, which is a significant step forward in maintaining high-frequency details.

Methodology

The approach leverages wavelet transformation within the generator architecture. The WaveEncoder decomposes features into multiple frequency bands, while the WaveDecoder reconstructs the image. Low-frequency components are transmitted directly from the encoder to the decoder through skip connections, maintaining foundational image structures. Similarly, high-frequency components are fed to the decoder, where two mechanisms, WaveGAN-M and WaveGAN-B, offer alternative strategies for processing these signals. The former averages high-frequency components across shots, while the latter selects a base index to maintain individualized detail more reliably as the shot count changes.

Results

Experimental validation on datasets such as Flower, Animal Faces, and VGGFace demonstrated substantive improvements in both Fréchet Inception Distance (FID) and Learned Perceptual Image Patch Similarity (LPIPS) metrics. Notably, WaveGAN achieved superior FID scores compared to existing methods, such as LoFGAN, suggesting a marked enhancement in output image quality and diversity.

Implications and Future Directions

Practically, WaveGAN's framework allows for the generation of high-fidelity images from minimal samples, addressing a core challenge in few-shot learning with GANs. Theoretically, it underscores the importance of multi-frequency analysis for image synthesis tasks, potentially opening avenues for further exploration in frequency space across other generative tasks beyond few-shot generation.

The implications for the field are significant, as this approach provides a blueprint for integrating frequency analysis with neural network architectures to preserve details that traditional pixel-space methods may overlook. Future research could expand on this approach to integrate with other types of neural network architectures or explore its applicability for different modalities, such as audio or video synthesis. The prospective expansions could lead to more robust synthesis methods capable of operating effectively in environments with even more pronounced data constraints.

WaveGAN represents a noteworthy contribution towards enhancing the capability of generative models in data-constrained scenarios, laying the groundwork for more nuanced and sophisticated approaches in the future of advanced AI image generation techniques.