When Semantic Segmentation Meets Frequency Aliasing (2403.09065v3)
Abstract: Despite recent advancements in semantic segmentation, where and what pixels are hard to segment remains largely unexplored. Existing research only separates an image into easy and hard regions and empirically observes the latter are associated with object boundaries. In this paper, we conduct a comprehensive analysis of hard pixel errors, categorizing them into three types: false responses, merging mistakes, and displacements. Our findings reveal a quantitative association between hard pixels and aliasing, which is distortion caused by the overlapping of frequency components in the Fourier domain during downsampling. To identify the frequencies responsible for aliasing, we propose using the equivalent sampling rate to calculate the Nyquist frequency, which marks the threshold for aliasing. Then, we introduce the aliasing score as a metric to quantify the extent of aliasing. While positively correlated with the proposed aliasing score, three types of hard pixels exhibit different patterns. Here, we propose two novel de-aliasing filter (DAF) and frequency mixing (FreqMix) modules to alleviate aliasing degradation by accurately removing or adjusting frequencies higher than the Nyquist frequency. The DAF precisely removes the frequencies responsible for aliasing before downsampling, while the FreqMix dynamically selects high-frequency components within the encoder block. Experimental results demonstrate consistent improvements in semantic segmentation and low-light instance segmentation tasks. The code is available at: https://github.com/Linwei-Chen/Seg-Aliasing.
- Afformer: Head-free lightweight semantic segmentation with linear transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1–9, 2023.
- Learning to see in the dark. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3291–3300, 2018.
- Efficient hybrid supervision for instance segmentation in aerial images. Remote Sensing, 13(2):252, 2021.
- Consistency-aware map generation at multiple zoom levels using aerial image. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15:5953–5966, 2022a.
- Hybrid supervised instance segmentation by learning label noise suppression. Neurocomputing, 496:131–146, 2022b.
- Instance segmentation in the dark. International Journal of Computer Vision, 131(8):2198–2218, 2023.
- Boundary iou: Improving object-centric image segmentation evaluation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 15334–15342, 2021.
- Masked-attention mask transformer for universal image segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1299, 2022.
- The cityscapes dataset for semantic urban scene understanding. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223, 2016.
- Nightlab: A dual-level architecture with hardness detection for segmentation at night. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 16938–16948, 2022.
- Boundary-aware feature propagation for scene segmentation. In Proceedings of IEEE International Conference on Computer Vision, pp. 6819–6829, 2019.
- Repvgg: Making vgg-style convnets great again. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 13733–13742, 2021.
- Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 7890–7899, 2020.
- The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303–338, 2010.
- U-net: deep learning for cell counting, detection, and morphometry. Nature methods, 16(1):67–70, 2019.
- Level-aware consistent multilevel map translation from satellite imagery. IEEE Transactions on Geoscience and Remote Sensing, 61:1–14, 2022a.
- Le-gan: Unsupervised low-light image enhancement network using attention module and identity invariant loss. Knowledge-Based Systems, 240:108010, 2022b.
- Low-light raw video denoising with a high-quality realistic motion dataset. IEEE Transactions on Multimedia, 2022c.
- Frequencylowcut pooling-plug and play against catastrophic overfitting. In Proceedings of European Conference on Computer Vision, pp. 36–57, 2022.
- Self-guided network for fast image denoising. In Proceedings of IEEE International Conference on Computer Vision, pp. 2511–2520, 2019.
- Hard pixel mining for depth privileged semantic segmentation. IEEE Transactions on Multimedia, 23:3738–3751, 2020.
- Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1780–1789, 2020.
- Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
- Crafting object detection in very low light. In Proceedings of the British Machine Vision Conference, volume 1, pp. 1–15, 2021.
- Anti-aliasing deep image classifiers using novel depth adaptive blurring and activation function. Neurocomputing, 536:164–174, 2023.
- Planning-oriented autonomous driving. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 17853–17862, 2023.
- Fapn: Feature-aligned pyramid network for dense image prediction. In Proceedings of IEEE International Conference on Computer Vision, pp. 864–873, 2021.
- Adaptive frequency filters as efficient global token mixers. In Proceedings of IEEE International Conference on Computer Vision, pp. 1–11, 2023.
- Enlightengan: Deep light enhancement without paired supervision. IEEE Transactions on Image Processing, 30:2340–2349, 2021.
- Spectral distribution aware image generation. In Association for the Advancement of Artificial Intelligence, volume 35, pp. 1734–1742, 2021.
- Alias-free generative adversarial networks. NeurIPS, 34:852–863, 2021.
- Pointrend: Image segmentation as rendering. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 9799–9808, 2020.
- Restoring extremely dark images in real time. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3487–3497, 2021.
- Wavecnet: Wavelet integrated cnns to suppress aliasing effect for noise-robust image classification. IEEE Transaction on Image Process., 30:7074–7089, 2021.
- Improving semantic segmentation via decoupled body and edge supervision. In Proceedings of European Conference on Computer Vision, pp. 435–452. Springer, 2020a.
- Semantic flow for fast and accurate scene parsing. In Proceedings of European Conference on Computer Vision, pp. 775–793. Springer, 2020b.
- Not all pixels are equal: Difficulty-aware semantic segmentation via deep layer cascade. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3193–3202, 2017.
- Feature pyramid networks for object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125, 2017a.
- Focal loss for dense object detection. In ICCV, pp. 2980–2988, 2017b.
- Learning structure-aware semantic segmentation with image-level supervision. In International Joint Conference on Neural Networks, pp. 1–8, 2021a.
- A large-scale climate-aware satellite image dataset for domain adaptive land-cover semantic segmentation. ISPRS Journal of Photogrammetry and Remote Sensing, 205:98–114, 2023.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of IEEE International Conference on Computer Vision, pp. 10012–10022, 2021b.
- A convnet for the 2020s. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 11976–11986, 2022.
- Fully convolutional networks for semantic segmentation. In Proceedings of IEEE International Conference on Computer Vision, pp. 3431–3440, 2015.
- Quantitative error measures for edge detection. Pattern recognition, 46(4):1125–1139, 2013.
- Bonnet: An open-source training and deployment framework for semantic segmentation in robotics using cnns. In IEEE International Conference on Robotics and Automation, pp. 7094–7100. IEEE, 2019.
- Harry Nyquist. Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers, 47(2):617–644, 1928.
- Fcanet: Frequency channel attention networks. In Proceedings of IEEE International Conference on Computer Vision, pp. 783–792, 2021.
- On the spectral bias of neural networks. In Proceedings of International Conference on Machine Learning, pp. 5301–5310, 2019.
- Claude E Shannon. Communication in the presence of noise. Proceedings of the IRE, 37(1):10–21, 1949.
- Training region-based object detectors with online hard example mining. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769, 2016.
- Hard attention net for automatic retinal vessel segmentation. IEEE Journal of Biomedical and Health Informatics, 24(12):3384–3396, 2020.
- Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 14408–14419, 2023.
- A physics-based noise formation model for extreme low-light raw denoising. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2758–2767, 2020.
- Physics-based noise modeling for extreme low-light photography. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8520–8537, 2021.
- Unified perceptual parsing for scene understanding. In Proceedings of European Conference on Computer Vision, pp. 418–434, 2018.
- Learning in the frequency domain. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1740–1749, 2020.
- Deep frequency principle towards understanding why deeper learning is faster. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 10541–10550, 2021.
- Online hard region mining for semantic segmentation. Neural Processing Letters, 50:2665–2679, 2019.
- Learning rain location prior for nighttime deraining. In Proceedings of IEEE International Conference on Computer Vision, pp. 13148–13157, 2023.
- Richard Zhang. Making convolutional networks shift-invariant again. In Proceedings of International Conference on Machine Learning, pp. 7324–7334, 2019.
- Learning to match anchors for visual object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4106–4115, 2019.
- Scene parsing through ade20k dataset. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 633–641, 2017.
- Learning statistical texture for semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 12537–12546, 2021.
- Delving deeper into anti-aliasing in convnets. In Proceedings of the British Machine Vision Conference, pp. 1–13, 2020.
- Estimating fine-grained noise model via contrastive learning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 12682–12691, 2022.
- Iterative denoiser and noise estimator for self-supervised image denoising. In Proceedings of IEEE International Conference on Computer Vision, pp. 13265–13274, 2023.