HSC-SAM: Hyperspectral Camouflage Segmentation
- The paper introduces HSC-SAM, a framework that adapts SAM using spectral–spatial decomposition and adaptive prompting to achieve state-of-the-art hyperspectral camouflaged object detection.
- It employs dual token streams, spectral-guided token dropout, and prompt fusion to efficiently integrate rich spectral cues for precise segmentation.
- The model outperforms RGB and HSI baselines on the HyperCOD benchmark with a compact computational profile, indicating its potential for broader hyperspectral segmentation tasks.
HyperSpectral Camouflage-aware SAM (HSC-SAM) is a segmentation framework designed to bridge the modality gap between hyperspectral imagery (HSI) and foundation vision models, specifically adapting Meta's Segment Anything Model (SAM) for hyperspectral camouflaged object detection (HCOD). Developed alongside the HyperCOD benchmark, HSC-SAM leverages explicit spatial–spectral decomposition, spectral saliency-based adaptive prompting, and token selection to achieve state-of-the-art (SOTA) performance on challenging camouflaged object delineation tasks in hyperspectral data (Bai et al., 7 Jan 2026).
1. Architectural Design and Components
HSC-SAM re-engineers the SAM pipeline to fully exploit the unique spectral information in HSI cubes. The architecture introduces the following sequence of components:
- Spectral-Spatial Decomposition Module (SSDM): Splits an HSI cube into a "spatial map" (pseudo-RGB via CIEXYZ) for image encoding, and a "spectral saliency map" (derived from spectral angular analysis) as a semantic prompt.
- Dual Token Streams: is tokenized for the image encoder, for the prompt encoder; both progress through the transformer layers.
- Spectral-Guided Token Dropout (SGTD): A channel-wise saliency mask prunes tokens of low spectral relevance prior to transformer self-attention, reducing computational burden and suppressing distractors.
- Spectral-Spatial Complementary Prompting (SSCP): Within each transformer block, tokens from both streams are fused via a prompt-fusion block to enhance cross-modal feature integration.
- Segmentation Decoder and Fusion Detail Enhancer (FDE): Coarse segmentation is generated by the SAM decoder; the FDE module injects low-level spatial details to refine mask boundaries without incurring additional inference costs.
This modular adaptation enables HSC-SAM to leverage both spatial similarity and subtle spectral contrast in HSI, which is critical for camouflaged object segmentation.
2. Spectral–Spatial Feature Construction and Prompting
Fundamental to HSC-SAM is the translation of the multi-band HSI input into spatial and spectral representations suitable for SAM's architecture:
- CIEXYZ Spatial Map Construction:
where are standard CIE color-matching functions.
- Spectral Saliency Map via Spectral Angular Distance (SAD):
At each pyramid level, patchwise vectors are compared with coarser-scale patches :
Concatenating the results over selected scales forms the three-channel prompt .
- Token-Level Saliency and Dropout:
Embedded spectral tokens are scored channel-wise:
Binarization at threshold yields a mask , which is used to drop non-salient tokens:
These mechanisms inject explicit spectral priors into the transformer attention regime, aligning HSI cues with the prompting logic of SAM.
3. Training Protocol and Optimization Strategy
The model is trained on the HyperCOD dataset, consisting of 350 HSIs (280 train, 70 test). The loss function aggregates binary cross-entropy (BCE) and intersection-over-union (IoU) losses over both the decoder output () and the final output (): Implementation details (learning rates, augmentations, epoch schedules) follow standard SAM-tuning practices; explicit values are not given.
Ablation results indicate the contribution of each architectural piece; SSDM, SSCP, SGTD, and FDE each lead to incremental gains (+15.6%, +7.0%, +6.9%, and +6.9% in Adaptive-F, respectively) (Bai et al., 7 Jan 2026).
4. Quantitative Benchmarks and Comparative Analysis
Performance on the HyperCOD test split (70 images) demonstrates the efficacy of HSC-SAM:
| Model | MAE | E-measure | S-measure | Adaptive-F | #Params (M) | GFLOPs |
|---|---|---|---|---|---|---|
| HSC-SAM | 0.0017 | 0.853 | 0.802 | 0.681 | 11.7 | 94.2 |
| SAM2-UNet | — | — | — | — | 216 | 128 |
| Other RGB/HSI baselines | >HSC-SAM | <HSC-SAM | <HSC-SAM | <HSC-SAM | — | — |
HSC-SAM outperforms RGB-based COD methods (e.g., SINet-V2, ZoomNet, FRINet, SAM2-UNet, HGINet, Camoformer) and HSI SOD baselines (SAD, DMSSN, SMN-PVT, Hyper-HRNet) in all key metrics, especially in scenes with cluttered backgrounds, dynamic lighting, and occlusions. Its parameter and computational footprint is substantially lower than prior SAM variants (11.7 M/94.2 G vs. 216 M/128 G).
5. Generalization Potential and Dataset Transferability
Though direct cross-dataset evaluation numbers are not presented, the modular spectral–spatial design and prompting mechanisms are postulated to transfer robustly to other public HSI datasets (e.g., HSOD-BIT, HSOD-BIT-V2). The inherent flexibility of HSC-SAM’s decoupled architecture suggests applicability beyond camouflage detection to general HSI segmentation tasks.
A plausible implication is that the spectral saliency-guided prompting, coupled with token dropout, would adapt efficiently to domains where the spectral domain carries the discriminative cues absent from RGB.
6. Computational Complexity and Limitations
HSC-SAM has 11.7 million parameters and processes a HSI in 94.2 GFLOPs. Inference runs at ~19.1 FPS (with SGTD), and ~20.3 FPS without token dropout (with a −4.2% drop in Adaptive-F).
Limitations include:
- The SGTD threshold must be tuned (optimal at 0.01 for HyperCOD).
- FDE operates only during training; run-time boundary refinement depends on feature quality upstream.
- The framework does not explicitly address robustness to extreme spectral noise or unmodeled sensor variations.
These factors should be considered for deployment in varying HSI acquisition contexts.
7. Significance and Impact
Key contributions of HSC-SAM include:
- Introduction of HyperCOD, filling a benchmark gap for real-world HSI COD with 350 annotated samples and 200 bands.
- Development of HSC-SAM, operationalizing spectral–spatial decomposition, adaptive prompting, token selection, and training-time refinement specifically for large vision models and HSI.
- Establishment of a new SOTA (MAE=0.0017) on HyperCOD, exceeding both RGB and HSI-specific baselines with a more compact computational profile.
- The architecture’s components (SSDM, SGTD, SSCP, FDE) have potential as general modules for future hyperspectral foundation models and other HSI segmentation tasks.
HSC-SAM marks a significant step in integrating hyperspectral priors into large-scale vision modeling, demonstrating that explicitly distilling spectral structure into modern transformer architectures is fundamental for high-fidelity segmentation under spectral camouflage conditions (Bai et al., 7 Jan 2026).