Head-Free Lightweight Semantic Segmentation with Linear Transformer (2301.04648v1)

Published 11 Jan 2023 in cs.CV

Abstract: Existing semantic segmentation works have been mainly focused on designing effective decoders; however, the computational load introduced by the overall structure has long been ignored, which hinders their applications on resource-constrained hardwares. In this paper, we propose a head-free lightweight architecture specifically for semantic segmentation, named Adaptive Frequency Transformer. It adopts a parallel architecture to leverage prototype representations as specific learnable local descriptions which replaces the decoder and preserves the rich image semantics on high-resolution features. Although removing the decoder compresses most of the computation, the accuracy of the parallel structure is still limited by low computational resources. Therefore, we employ heterogeneous operators (CNN and Vision Transformer) for pixel embedding and prototype representations to further save computational costs. Moreover, it is very difficult to linearize the complexity of the vision Transformer from the perspective of spatial domain. Due to the fact that semantic segmentation is very sensitive to frequency information, we construct a lightweight prototype learning block with adaptive frequency filter of complexity $O(n)$ to replace standard self attention with $O(n^{2})$. Extensive experiments on widely adopted datasets demonstrate that our model achieves superior accuracy while retaining only 3M parameters. On the ADE20K dataset, our model achieves 41.8 mIoU and 4.6 GFLOPs, which is 4.4 mIoU higher than Segformer, with 45% less GFLOPs. On the Cityscapes dataset, our model achieves 78.7 mIoU and 34.4 GFLOPs, which is 2.5 mIoU higher than Segformer with 72.5% less GFLOPs. Code is available at https://github.com/dongbo811/AFFormer.

Authors (3)

Bo Dong (50 papers)
Pichao Wang (65 papers)
Fan Wang (313 papers)

Citations (50)

View on Semantic Scholar

Summary

Insightful Overview of "Head-Free Lightweight Semantic Segmentation with Linear Transformer"

The paper presents "Adaptive Frequency Transformer" (AFFormer), a novel architecture for semantic segmentation tasked with reducing computational load while achieving high accuracy. This work emerges in response to the growing need for efficient semantic segmentation models deployable on resource-constrained hardware. Unlike traditional strategies focusing on decoder design, AFFormer advances a head-free architecture, leveraging a parallel structure to manage prototype representations as learnable local descriptions. This approach effectively eliminates the need for decoders, addressing computational concerns prevalent in current methods.

The innovation within AFFormer extends to using heterogeneous operators, specifically CNNs and Vision Transformers, for pixel embedding and prototype representations, culminating in significant computational efficiency. The challenge of linearly scaling the complexity of vision Transformers—traditionally quadratic—due to spatial domain concerns is deftly addressed through a unique lightweight prototype learning block. This block employs an adaptive frequency filter to curtail complexity to O(n), replacing the standard self-attention model typically operating at O(n²).

Empirical results denote substantive improvements in performance. On the ADE20K dataset, the AFFormer achieved 41.8 mIoU using only 4.6 GFLOPs, which is 4.4 mIoU higher than Segformer, yet demand only 55% of its computational cost. A similar pattern is observed with the Cityscapes dataset, where AFFormer demonstrated a 78.7 mIoU with only 34.4 GFLOPs, outperforming Segformer by 2.5 mIoU with a reduction in computational expense by 72.5%. These outcomes underscore AFFormer's efficacy in achieving superior segmentation precision with maintained computational thrift, showcasing its applicability in constrained computational environments.

This paper not only exhibits an advancement in the efficiency and accuracy of semantic segmentation models but also propagates new methodological possibilities within AI. By demonstrating that adaptive frequency filtering can be successfully integrated into learning architectures without the burden of large computational costs, the research highlights a shift towards potentially more lightweight and scalable AI systems. Practically, implementations of AFFormer could lead to broader deployment capabilities for real-time and embedded systems reliant on semantic segmentation.

Looking forward, this research encourages exploration into more adaptive and frequency-responsive transformations in AI models, promoting a comprehensive understanding of various data representations to optimize performance alongside feasibility in low-resource settings. Such advancements might lead to significant contributions in fields like autonomous driving, robotics, and beyond, where computational efficiency and accuracy are paramount.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - dongbo811/AFFormer (124 stars)