- The paper introduces a novel deep learning framework integrating spatio-spectral attention blocks and illuminant priors for high-dimensional SPD estimation.
- It leverages sensor-agnostic spectral-domain transformation, reducing mean angular errors and generalizing effectively across different MS sensor domains.
- Empirical results on the MILD and BeyondRGB datasets confirm robust SPD recovery and improved white-balancing under challenging illumination conditions.
Spectrum-Aware Deep Illuminant Estimation from Multispectral Images
Introduction
The paper "Spectrum Aware Illumination Estimation Using Multispectral Image" (2606.14248) addresses the challenge of accurate illuminant spectral power distribution (SPD) estimation from multispectral (MS) images, which is critical for downstream tasks such as color constancy, color rendering, and robust computer vision pipelines. The authors propose a deep learning framework integrating spatio-spectral attention mechanisms and illuminant priors into the feature extraction pipeline, enabling effective spectral correlation exploitation and generalization across MS sensor domains. Additionally, a new dataset, MILD, is introduced, capturing diverse lighting conditionsโincluding challenging monochromatic casesโwith high-dimensional ground-truth SPD measured using a spectroradiometer.
Architectural Overview
The proposed framework departs from prior methods by explicitly modeling spectral inter-channel relationships within MS images using two successive Spatio-Spectral Feature Extractors (SSFEs), integrating spectral attention blocks and an illuminant prior (IP). The network estimates a high-dimensional illuminant SPD and applies target-domain linear projections via physically defined matrices, such as sensor spectral sensitivity functions (CSF) and color-matching functions (CMF), to allow deployment across heterogeneous camera domains without retraining.
Figure 1: Overall pipeline illustrating high-dimensional illuminant SPD estimation and linear projection to target domains via CSF/CMF matrices.
The architectural backbone leverages 3D convolutions for joint spatial and spectral processing, followed by two spectral attention modules:
MILD Dataset and Sensor Characteristics
The MILD dataset comprises 15-channel MS images sampled from 380โ835 nm, acquired under both natural and artificially synthesized illuminantsโincluding 42 mono-wavelength sources that deviate substantially from the Planckian locus, posing severe challenges for RGB-based approaches.
Figure 3: MILD dataset scene examples, top: without reference, bottom: with reference color charts and white standards.
Each image is accompanied by spectroradiometer-measured GT SPD (36 channels, 380โ730 nm). The sensor response curves for each channel are profiled, enabling principled spectral-domain transformation via CSF matrices.
Figure 4: Normalized spectral response for each MS sensor channel over [0,1], characterizing channel selectivity.
Mapping of illuminants in chromaticity space visualizes the dataset's coverage, emphasizing its spectrum diversity and inclusion of non-standard mono-wavelength sources.
Figure 5: CIE-xy chromaticity coordinates for the MILD dataset's illuminants; (a) non-mono-wavelength, (b) mono-wavelength spectra.
Spectral-Domain Transformation and Generalization
A key contribution is the sensor-agnostic spectral-domain transformation, enabling a single high-dimensional illumination estimator to generalize across arbitrary sensor domains and color spaces. Linear mappings via CSF and CMF matrices ensure that only the target-observable spectral eigen-directions contribute, as demonstrated via SVD analysis. This approach circumvents the need for retraining per sensor, underpinning practical deployments.
Empirical Results and Benchmarking
Extensive experiments demonstrate the superiority and stability of the proposed method. On the BeyondRGB dataset, the framework achieves a mean angular error (AE) of 2.51ยฐ (lab) and 4.92ยฐ (field), outperforming previous SOTA (BeyondRGB: 5.92ยฐ and 7.22ยฐ). On MILD, the mean AE drops to 3.18ยฐ, with robust estimation in mono-wavelength cases (MILD(m): 7.44ยฐ). Ablation studies confirm substantial performance gains from spectral attention and IP integration.
Qualitative comparisons on extreme spectral scenes show near-perfect curve fitting to ground-truth SPDs, maintaining distinct spectral peaks where RGB-based models fail.
Figure 6: Qualitative comparison of SPD estimation across methods on MILD; proposed model accurately fits mono-wavelength ground truth.
White balancing using estimated SPD further demonstrates practical utility, producing visually stable color renderings under challenging illumination.
Figure 7: White-balancing results on MILD; proposed model achieves closest color match to GT illuminant-corrected image.
Physical Modeling for Multi-Source and Intensity Estimation
Extensibility is validated using physical light attenuation models (Yuksel's point attenuation), combining estimated SPD and spatial intensity fitting for both single and multi-source scenarios. The fitted models achieve R2>0.97 and low AE (direct: 2.48ยฐโ5.57ยฐ), confirming the estimated spectral shape's sufficiency for intensity-level modeling and spectral superposition in controlled environments.
Figure 8: Single-source illumination estimation with spatial attenuation model; direct regions exhibit strong agreement with fitted SPD.
Figure 9: Multi-source spectral superposition verification and spatial intensity fitting, confirming linear combination and accurate intensity recovery.
Complexity and Ablation
Complexity analysis shows negligible overhead for high-dimensional SPD estimation, with only a minor increase in parameter count and inference time compared to direct low-dimensional prediction. Ablation studies highlight the impact of spectral attention and physical prior integration, with IP providing non-trivial gains over random or learnable vectors.
Implications and Future Directions
The framework provides robust, generalizable SPD estimation across MS sensor domains, supporting critical vision pipelines such as color correction and white balancing in real-world, non-standard illumination. The sensor-agnostic spectral-domain projection paves the way for versatile single-model deployments, while physical attenuation integration facilitates modeling of spatially varying illumination and multi-source environments.
Future work should extend spatially varying illuminant estimation to unconstrained environments, incorporating neural source detection, occlusion-aware attenuation, and real-world indirect illumination modeling. Further research on end-to-end spatial spectral estimation and integration with downstream vision models is warranted.
Conclusion
This work establishes a scalable, physically grounded paradigm for MS-based illuminant estimation, combining deep spectral attention, physical priors, and sensor-agnostic transformation to achieve robust SPD recovery and practical deployment. The MILD dataset, with its diversity of lighting scenarios, also contributes substantial benchmarks for advancing research in spectral imaging and color constancy.