HSC-SAM: Hyperspectral Camouflage Segmentation

Updated 14 January 2026

The paper introduces HSC-SAM, a framework that adapts SAM using spectral–spatial decomposition and adaptive prompting to achieve state-of-the-art hyperspectral camouflaged object detection.
It employs dual token streams, spectral-guided token dropout, and prompt fusion to efficiently integrate rich spectral cues for precise segmentation.
The model outperforms RGB and HSI baselines on the HyperCOD benchmark with a compact computational profile, indicating its potential for broader hyperspectral segmentation tasks.

HyperSpectral Camouflage-aware SAM (HSC-SAM) is a segmentation framework designed to bridge the modality gap between hyperspectral imagery (HSI) and foundation vision models, specifically adapting Meta's Segment Anything Model (SAM) for hyperspectral camouflaged object detection (HCOD). Developed alongside the HyperCOD benchmark, HSC-SAM leverages explicit spatial–spectral decomposition, spectral saliency-based adaptive prompting, and token selection to achieve state-of-the-art (SOTA) performance on challenging camouflaged object delineation tasks in hyperspectral data (Bai et al., 7 Jan 2026).

1. Architectural Design and Components

HSC-SAM re-engineers the SAM pipeline to fully exploit the unique spectral information in HSI cubes. The architecture introduces the following sequence of components:

Spectral-Spatial Decomposition Module (SSDM): Splits an HSI cube $I(\lambda)$ into a "spatial map" $I_M$ (pseudo-RGB via CIEXYZ) for image encoding, and a "spectral saliency map" $I_S$ (derived from spectral angular analysis) as a semantic prompt.
Dual Token Streams: $I_M$ is tokenized for the image encoder, $I_S$ for the prompt encoder; both progress through the transformer layers.
Spectral-Guided Token Dropout (SGTD): A channel-wise saliency mask prunes tokens of low spectral relevance prior to transformer self-attention, reducing computational burden and suppressing distractors.
Spectral-Spatial Complementary Prompting (SSCP): Within each transformer block, tokens from both streams are fused via a prompt-fusion block to enhance cross-modal feature integration.
Segmentation Decoder and Fusion Detail Enhancer (FDE): Coarse segmentation is generated by the SAM decoder; the FDE module injects low-level spatial details to refine mask boundaries without incurring additional inference costs.

This modular adaptation enables HSC-SAM to leverage both spatial similarity and subtle spectral contrast in HSI, which is critical for camouflaged object segmentation.

2. Spectral–Spatial Feature Construction and Prompting

Fundamental to HSC-SAM is the translation of the multi-band HSI input into spatial and spectral representations suitable for SAM's architecture:

CIEXYZ Spatial Map Construction:

$\left[I_M\right]_t(x, y) = \sum_{i=1}^N I(x, y, \lambda_i)W_t(\lambda_i), \quad t\in\{X, Y, Z\}$

where $W_t(\lambda)$ are standard CIE color-matching functions.

Spectral Saliency Map via Spectral Angular Distance (SAD):

At each pyramid level, patchwise vectors $v_c^{(i, j)}$ are compared with coarser-scale patches $v_{c+3}^{(i, j)}$ :

$S^{(c)}(i, j) = \arccos\left(\frac{v_c^{(i, j)} \cdot v_{c+3}^{(i, j)}}{\|v_c^{(i, j)}\|_2 \|v_{c+3}^{(i, j)}\|_2}\right)$

Concatenating the results over selected scales forms the three-channel prompt $I_S$ .

Token-Level Saliency and Dropout:

Embedded spectral tokens $X_S$ are scored channel-wise:

$S_{\mathrm{score}}^{(i, j)} = \frac{1}{C}\sum_{c=1}^C X_S^{(i, j, c)}$

Binarization at threshold $\tau$ yields a mask $M_{\mathrm{mask}}$ , which is used to drop non-salient tokens:

$\widetilde{X}_M = X_M \odot M_{\mathrm{mask}}$

These mechanisms inject explicit spectral priors into the transformer attention regime, aligning HSI cues with the prompting logic of SAM.

3. Training Protocol and Optimization Strategy

The model is trained on the HyperCOD dataset, consisting of 350 HSIs (280 train, 70 test). The loss function aggregates binary cross-entropy (BCE) and intersection-over-union (IoU) losses over both the decoder output ( $S_d$ ) and the final output ( $S_f$ ): $\mathcal{L} = \mathcal{L}_{\mathrm{dec}} + \mathcal{L}_{\mathrm{final}} = (\mathrm{BCE}(S_d, G) + \mathrm{IoU}(S_d, G)) + (\mathrm{BCE}(S_f, G) + \mathrm{IoU}(S_f, G))$ Implementation details (learning rates, augmentations, epoch schedules) follow standard SAM-tuning practices; explicit values are not given.

Ablation results indicate the contribution of each architectural piece; SSDM, SSCP, SGTD, and FDE each lead to incremental gains (+15.6%, +7.0%, +6.9%, and +6.9% in Adaptive-F, respectively) (Bai et al., 7 Jan 2026).

4. Quantitative Benchmarks and Comparative Analysis

Performance on the HyperCOD test split (70 images) demonstrates the efficacy of HSC-SAM:

Model	MAE	E-measure	S-measure	Adaptive-F	#Params (M)	GFLOPs
HSC-SAM	0.0017	0.853	0.802	0.681	11.7	94.2
SAM2-UNet	—	—	—	—	216	128
Other RGB/HSI baselines	>HSC-SAM	<HSC-SAM	<HSC-SAM	<HSC-SAM	—	—

HSC-SAM outperforms RGB-based COD methods (e.g., SINet-V2, ZoomNet, FRINet, SAM2-UNet, HGINet, Camoformer) and HSI SOD baselines (SAD, DMSSN, SMN-PVT, Hyper-HRNet) in all key metrics, especially in scenes with cluttered backgrounds, dynamic lighting, and occlusions. Its parameter and computational footprint is substantially lower than prior SAM variants (11.7 M/94.2 G vs. 216 M/128 G).

5. Generalization Potential and Dataset Transferability

Though direct cross-dataset evaluation numbers are not presented, the modular spectral–spatial design and prompting mechanisms are postulated to transfer robustly to other public HSI datasets (e.g., HSOD-BIT, HSOD-BIT-V2). The inherent flexibility of HSC-SAM’s decoupled architecture suggests applicability beyond camouflage detection to general HSI segmentation tasks.

A plausible implication is that the spectral saliency-guided prompting, coupled with token dropout, would adapt efficiently to domains where the spectral domain carries the discriminative cues absent from RGB.

6. Computational Complexity and Limitations

HSC-SAM has 11.7 million parameters and processes a $1240 \times 1680$ HSI in 94.2 GFLOPs. Inference runs at ~19.1 FPS (with SGTD), and ~20.3 FPS without token dropout (with a −4.2% drop in Adaptive-F).

Limitations include:

The SGTD threshold $\tau$ must be tuned (optimal at 0.01 for HyperCOD).
FDE operates only during training; run-time boundary refinement depends on feature quality upstream.
The framework does not explicitly address robustness to extreme spectral noise or unmodeled sensor variations.

These factors should be considered for deployment in varying HSI acquisition contexts.

7. Significance and Impact

Key contributions of HSC-SAM include:

Introduction of HyperCOD, filling a benchmark gap for real-world HSI COD with 350 annotated samples and 200 bands.
Development of HSC-SAM, operationalizing spectral–spatial decomposition, adaptive prompting, token selection, and training-time refinement specifically for large vision models and HSI.
Establishment of a new SOTA (MAE=0.0017) on HyperCOD, exceeding both RGB and HSI-specific baselines with a more compact computational profile.
The architecture’s components (SSDM, SGTD, SSCP, FDE) have potential as general modules for future hyperspectral foundation models and other HSI segmentation tasks.

HSC-SAM marks a significant step in integrating hyperspectral priors into large-scale vision modeling, demonstrating that explicitly distilling spectral structure into modern transformer architectures is fundamental for high-fidelity segmentation under spectral camouflage conditions (Bai et al., 7 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

HyperCOD: The First Challenging Benchmark and Baseline for Hyperspectral Camouflaged Object Detection (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HyperSpectral Camouflage-aware SAM (HSC-SAM).