Learning in the Frequency Domain (2002.12416v4)

Published 27 Feb 2020 in cs.CV

Abstract: Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.41% and 0.66% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.

Citations (351)

View on Semantic Scholar

Summary

The paper introduces a novel frequency-based method that discards non-essential components to boost CNN performance.
It integrates with models like ResNet-50 and MobileNetV2, achieving up to a 1.6% top-1 accuracy gain on ImageNet.
The approach reduces input data loss and bandwidth needs, offering efficiency for resource-constrained deployments.

Learning in the Frequency Domain: An Expert Review

The paper "Learning in the Frequency Domain" presents an innovative approach for leveraging frequency-domain information in the context of deep learning, especially for applications involving large images such as those found in computer vision tasks. Traditional deep neural networks (DNNs), particularly convolutional neural networks (CNNs), operate in the spatial domain and necessitate substantial downsampling of input images to fit within their fixed input dimensions. This downsampling often results in the loss of both redundant and salient information critical to maintaining accuracy.

Methodological Overview

The authors propose a frequency-based methodology grounded in signal processing theories to address the spectral bias inherent in CNN models. They introduce a learning-based frequency selection mechanism aimed at discarding trivial frequency components that do not significantly impact model accuracy. This approach integrates seamlessly with well-established neural networks, including ResNet-50, MobileNetV2, and Mask R-CNN, by accepting frequency-domain inputs derived through the discrete cosine transform (DCT).

Experimental Insights

One of the most compelling results of the paper is the enhancement in top-1 accuracy observed on benchmark datasets. Notably, when applied to the ImageNet dataset, there was an observed accuracy improvement of 1.60% and 0.63% for ResNet-50 and MobileNetV2, respectively, when using the same input size as traditional approaches. Furthermore, with only half the input size, ResNet-50 achieved a 1.42% accuracy improvement. In the field of instance segmentation, Mask R-CNN showed a significant increase of 0.8% in average precision on the COCO dataset when leveraging the proposed frequency-based method.

Technical Contributions

The paper makes several noteworthy technical contributions:

A thorough spectral analysis illustrating CNNs' increased sensitivity to low-frequency channels compared to high-frequency ones, aligning with characteristics of the human visual system (HVS).
Introduction of a dynamic learning-based channel selection method that identifies non-essential frequency components, allowing for their removal during inference.
Demonstration of network modifications that facilitate frequency-domain inputs without extensive re-engineering, thereby suggesting a drop-in replacement for conventional preprocessing pipelines.

Implications and Future Directions

The findings underscore significant theoretical and practical implications. Theoretically, they hint at a need to re-evaluate CNN design frameworks under spectral assumptions, possibly advocating for architectures inherently operating in frequency domains. From a practical perspective, the reduced fidelity loss, accompanying accuracy gains, and lowered bandwidth requirements for input data transmission can optimize ML workflows, particularly in resource-constrained environments or edge computing setups.

Future endeavors might explore adaptive frequency selection methods that dynamically adjust in real-time as models process diverse datasets. There's also an understated opportunity for blending frequency-domain techniques with spatial approaches to capitalize on the strengths of each perspective.

Overall, "Learning in the Frequency Domain" extends a promising avenue for reshaping foundational aspects of neural network training and inference, emphasizing the transformative potential of frequency-based methodologies in AI research.

PDF Markdown