SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers (2107.02988v2)

Published 7 Jul 2021 in cs.CV and cs.AI

Abstract: Hyperspectral (HS) images are characterized by approximately contiguous spectral information, enabling the fine identification of materials by capturing subtle spectral discrepancies. Owing to their excellent locally contextual modeling ability, convolutional neural networks (CNNs) have been proven to be a powerful feature extractor in HS image classification. However, CNNs fail to mine and represent the sequence attributes of spectral signatures well due to the limitations of their inherent network backbone. To solve this issue, we rethink HS image classification from a sequential perspective with transformers, and propose a novel backbone network called \ul{SpectralFormer}. Beyond band-wise representations in classic transformers, SpectralFormer is capable of learning spectrally local sequence information from neighboring bands of HS images, yielding group-wise spectral embeddings. More significantly, to reduce the possibility of losing valuable information in the layer-wise propagation process, we devise a cross-layer skip connection to convey memory-like components from shallow to deep layers by adaptively learning to fuse "soft" residuals across layers. It is worth noting that the proposed SpectralFormer is a highly flexible backbone network, which can be applicable to both pixel- and patch-wise inputs. We evaluate the classification performance of the proposed SpectralFormer on three HS datasets by conducting extensive experiments, showing the superiority over classic transformers and achieving a significant improvement in comparison with state-of-the-art backbone networks. The codes of this work will be available at https://github.com/danfenghong/IEEE_TGRS_SpectralFormer for the sake of reproducibility.

Citations (751)

View on Semantic Scholar

Summary

The paper introduces SpectralFormer, a transformer-based model that employs group-wise spectral embedding and cross-layer adaptive fusion to enhance hyperspectral image classification.
It demonstrates robust performance by achieving overall accuracies up to 91.07% on benchmark datasets such as Pavia University.
The innovative design efficiently integrates spatial and spectral information, addressing the limitations of conventional CNNs and RNNs in processing high-dimensional data.

SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers

The paper "SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers" introduces a novel backbone network designed to leverage the strengths of transformers for hyperspectral (HS) image classification tasks. The proposed network, SpectralFormer, addresses the notable limitations of conventional Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) in handling the sequential nature of spectral data and enhancing the characteristic embedding.

Motivation

Hyperspectral imaging captures extensive spectral information at each pixel, facilitating the fine-grained identification of various materials. Traditional CNNs have been quite successful in exploiting spatial-contextual information but tend to fall short in effectively modeling the sequence attributes of spectral data. RNNs, while inherently better suited for sequence data, often grapple with issues such as gradient vanishing and inefficient parallel processing capabilities. Transformers, known for their prowess in handling sequence data through self-attention mechanisms, are envisioned here to potentially mitigate these challenges.

Proposed Methodology

The key innovations within SpectralFormer are:

Group-wise Spectral Embedding (GSE): Instead of the conventional band-wise embedding, GSE captures locally spectral representations by considering neighboring bands, organizing them into groups for a more enriched feature extraction process.
Cross-layer Adaptive Fusion (CAF): To address information loss across layers, CAF introduces a mechanism for adaptively learning to fuse memory-like components from shallow to deep layers, ensuring valuable information is maintained throughout the network depth.

The SpectralFormer framework is flexible, accommodating both pixel-wise and patch-wise inputs, thus not only focusing on spectral but also on spatial information when necessary.

Experimental Validation

SpectralFormer was subjected to rigorous testing on three established hyperspectral datasets: Indian Pines, Pavia University, and Houston2013. The evaluation confirms its superior performance over the classic ViT and other state-of-the-art backbone networks like CNNs and RNNs. For instance:

Indian Pines Dataset: SpectralFormer demonstrated a notable improvement, achieving an overall accuracy (OA) of up to 81.76% with patch-wise input, significantly surpassing the classic ViT and conventional CNN-based methods.
Pavia University Dataset: The patch-wise SpectralFormer achieved an OA of 91.07%, benefiting from the enriched spatial-spectral feature embedding.
Houston2013 Dataset: SpectralFormer again outperformed other methods with an OA of 88.01%, showcasing its robustness across different datasets.

Discussion

The proposed GSE enables SpectralFormer to capture subtle spectral discrepancies more effectively, which is crucial for accurately differentiating materials with similar spectral signatures. The CAF module further ensures that the network retains critical information as it propagates through deeper layers. These enhancements collectively empower transformers to overcome the traditional shortcomings observed in CNNs and RNNs for hyperspectral image classification.

The implications of this research are both practical and theoretical. Practically, SpectralFormer can greatly enhance the accuracy and efficiency of hyperspectral image classification in various applications, such as precision agriculture, urban planning, and mineral exploration. Theoretically, this work pushes the boundaries of how transformers can be adapted and optimized for domain-specific applications involving sequential and high-dimensional data.

Future Developments

The authors suggest several avenues for future exploration:

Advanced Features: Incorporating more advanced self-attention mechanisms and self-supervised learning techniques to further improve model performance.
Lightweight Models: Developing streamlined versions of SpectralFormer to reduce computational overhead without compromising accuracy.
Incorporating Domain Knowledge: Embedding physical characteristics of spectral bands and other prior knowledge to achieve more interpretable models.

In conclusion, SpectralFormer represents a significant step forward in adapting transformer architectures for hyperspectral image analysis, substantially outperforming incumbent methods and paving the way for further innovations in the field.