FDA: Fourier Domain Adaptation for Semantic Segmentation (2004.05498v1)

Published 11 Apr 2020 in cs.CV

Abstract: We describe a simple method for unsupervised domain adaptation, whereby the discrepancy between the source and target distributions is reduced by swapping the low-frequency spectrum of one with the other. We illustrate the method in semantic segmentation, where densely annotated images are aplenty in one domain (synthetic data), but difficult to obtain in another (real images). Current state-of-the-art methods are complex, some requiring adversarial optimization to render the backbone of a neural network invariant to the discrete domain selection variable. Our method does not require any training to perform the domain alignment, just a simple Fourier Transform and its inverse. Despite its simplicity, it achieves state-of-the-art performance in the current benchmarks, when integrated into a relatively standard semantic segmentation model. Our results indicate that even simple procedures can discount nuisance variability in the data that more sophisticated methods struggle to learn away.

Citations (794)

View on Semantic Scholar

Summary

The paper introduces a novel UDA method that swaps low-frequency amplitude spectra to align source and target image domains.
It replaces complex adversarial training with a simple Fourier transform technique, achieving notable mIoU improvements on CityScapes.
Visualization results reveal cleaner, more consistent segmentation maps that confirm the method's potential for broader computer vision applications.

FDA: Fourier Domain Adaptation for Semantic Segmentation

The paper "FDA: Fourier Domain Adaptation for Semantic Segmentation" by Yanchao Yang and Stefano Soatto presents a novel method for Unsupervised Domain Adaptation (UDA) that leverages spectral properties of images to align the distributions of the source and target domains. Their approach, referred to as Fourier Domain Adaptation (FDA), proposes to reduce the domain discrepancy by swapping the low-frequency components of the amplitude spectra between the source and target images. This technique provides a straightforward, efficient alternative to current state-of-the-art methods that often involve complex adversarial training regimes.

Methodology

The core principle of FDA is based on the observation that low-level statistical variations in the amplitude spectrum can cause significant performance degradation in semantic segmentation tasks when transferring from synthetic to real imagery. Instead of training an auxiliary network to handle these variations, the proposed method employs a direct Fourier Transform (FT) of the source and target images. The amplitude spectrum of the target image replaces that of the source image within a defined low-frequency spectrum band, effectively creating a source image 'styled' as the target image, but retaining the semantic content. The inverse Fourier Transform (iFT) reconstructs the modified source image, now aligned to the target domain.

The method requires the selection of a single parameter β, which defines the size of the spectral neighborhood to be swapped. The strategy is robust across various values of β, providing flexibility in its application. The authors also introduce a Multi-band Transfer (MBT) scheme that aggregates the results from models trained with different β values, which further boosts performance.

Experimental Results

The empirical evaluation showcases the effectiveness of FDA on two challenging synthetic-to-real unsupervised domain adaptation tasks: GTA5 to CityScapes and SYNTHIA to CityScapes. Different backbone architectures, namely DeepLabV2 with ResNet101 and FCN-8s with VGG16, were employed to validate the robustness of the approach. Key findings from the experimental results include:

Higher mIoU Scores: In the GTA5 to CityScapes adaptation task, the FDA method achieved mean Intersection over Union (mIoU) scores superior to those of current state-of-the-art methods. For instance, FDA with ResNet101 outscored BDL by 4.0%, achieving an mIoU of 50.45%, compared to BDL's 48.5%.
Improved Performance Across Backbones: The method demonstrated consistent performance improvements across different network architectures. With VGG16, FDA-MBT surpassed the performance of BDL by obtaining an mIoU of 42.2% as opposed to BDL's 41.3%.
Visualization: Qualitative results indicated that FDA provides cleaner segmentation maps with less noise and better semantic consistency, especially in capturing finer structures and rare classes compared to BDL.

Implications and Future Directions

FDA simplifies the domain adaptation process by directly aligning low-level statistics using Fourier Transforms, without relying on adversarial training or image translation networks. This significantly reduces computational complexity and training overhead while achieving competitive performance. The method's success highlights the potential of addressing known nuisance variability through simple, yet effective, spectral domain modifications.

The findings suggest several future research directions:

Broader Application: Extending the FDA approach to other computer vision tasks beyond semantic segmentation could reveal further insights into the generalizability of spectral domain alignment.
Combining with Other Techniques: Exploring hybrid models that incorporate FDA with adversarial learning or other domain adaptation strategies may yield synergistic effects, improving performance further.
Automated Parameter Selection: Developing automated or adaptive methods for selecting the β parameter could enhance the robustness and ease of the approach, making it more accessible for diverse applications.

In conclusion, the FDA method offers a promising new direction for domain adaptation by leveraging Fourier domain properties. Its simplicity and effectiveness challenge the necessity of complex adversarial training, paving the way for more efficient and potentially more interpretable domain adaptation techniques.