Spectral 3D-STEM Methods
- Spectral 3D-STEM refers to approaches that reconstruct 3D structures by leveraging spectrally resolved measurements—such as wavelength bands, diffraction coordinates, or energy-loss channels—instead of single scalar images.
- These techniques employ physics-based models like multislice propagation, Beer–Lambert operators, and Fourier-domain diagnostics to enhance reconstruction quality with improved metrics such as PSNR and SSIM.
- Common challenges include spectral co-registration, computational intensity, and non-uniqueness in intensity inversions, prompting ongoing research in efficient spectral encoding and robust reconstruction algorithms.
“Spectral 3D-Stem” (Editor’s term) denotes a family of three-dimensional reconstruction, rendering, and analysis methods in which 3D structure is estimated, rendered, or interrogated through spectrally resolved measurements rather than through a single scalar image channel. In the recent literature, this designation spans at least three distinct settings: multi-spectral 3D scene representations that model wavelength-dependent appearance and semantics (Sinha et al., 2024), reciprocal-space-resolved 4D-STEM reconstructions that infer depth from diffraction-channel variation (Brown et al., 2020, Lee et al., 2022, Niermann et al., 26 Aug 2025), and hyperspectral spectrum-imaging workflows in which each spatial location carries a high-dimensional EELS or cathodoluminescence signature (Monier et al., 2020, Zobelli et al., 2019). Adjacent work in 2D-to-3D feature analysis and spectral CT further indicates that spectral consistency, compressed forward models, and physics-informed priors are recurrent organizing principles in high-dimensional 3D inverse problems (Xiao et al., 6 Mar 2026, Jiang et al., 28 Mar 2025).
1. Scope and meanings of spectrality
Across the cited literature, “spectral” is not used in a single sense. In some works it refers to wavelength-resolved appearance or illumination; in others it refers to reciprocal-space or detector-space channels; in others it denotes energy-resolved spectrum imaging. This distinction is essential, because methods that are all “spectral” may differ radically in sensing physics, forward models, and reconstruction objectives.
| Interpretation of “spectral” | Measured axis | Representative systems |
|---|---|---|
| Multispectral appearance | Wavelength bands | SpectralGaussians, DDSL |
| Reciprocal-space-resolved STEM | Diffraction coordinates or systematic-row | Single-projection 4D-STEM, MSET, 3D strain inversion |
| Hyperspectral spectrum imaging | EELS energy-loss axis or CL spectrum | CLS, random-scan hyperspectral STEM |
| Spectral diagnostics of features | Fourier spectrum of dense feature maps | Feature-upsampler probing |
In the multispectral scene-rendering literature, spectrality is bandwise and explicitly tied to wavelength-dependent appearance, reflectance, and lighting (Sinha et al., 2024, Shin et al., 2024). In 4D-STEM reconstruction, by contrast, the information-rich dimension is the angular or reciprocal-space distribution of scattering rather than energy-resolved chemistry (Brown et al., 2020, Lee et al., 2022). In STEM-EELS and cathodoluminescence, the third axis is a conventional spectral axis attached to each scan position, yielding a true hyperspectral cube (Monier et al., 2020, Zobelli et al., 2019). Recent analysis of 2D-to-3D front ends introduces yet another usage, where “spectral” means the Fourier-domain structure of dense feature maps and its preservation through upsampling (Xiao et al., 6 Mar 2026).
A common misconception is therefore that spectral 3D methods are necessarily energy-resolved or chemically specific. The cited work shows a broader picture: spectrality may refer to wavelength, momentum transfer, detector angle, or Fourier frequency, provided that the extra channel dimension is retained rather than collapsed.
2. Multispectral 3D scene representations and rendering
A canonical formulation of spectral 3D scene representation is “SpectralGaussians” (Sinha et al., 2024). The method extends vanilla 3D Gaussian Splatting from RGB radiance modeling to multi-spectral scene reconstruction, rendering, segmentation, and editing. Each Gaussian retains explicit 3D geometry and opacity, but its appearance channels become bandwise. Per spectrum , each Gaussian stores diffuse color , specular tint , roughness , and an Identity Encoding vector of length $16$. Lighting is estimated per spectrum by a differentiable environment light map, and view dependence is no longer represented by spherical harmonics for color but by a physically inspired shading function.
The per-spectrum color model is
where 0 is the normal and 1 is the direct specular light term. Semantics are embedded directly in the Gaussian representation rather than added as a post hoc 2D predictor. The rendered identity feature for a pixel and spectral band is
2
which is the standard front-to-back alpha-compositing pattern applied to semantic features. The method uses segmentation maps generated with Segment Anything Model and DEVA / “Tracking Anything,” and optimizes a per-spectrum objective combining 3, D-SSIM, 2D identity supervision, and 3D grouping regularization. An important practical detail is a 1000 warm-up iterations phase without full-spectrum maps, followed by initialization of full-spectrum BRDF parameters and normals by averaging values from the other spectra.
The representation is explicitly bandwise rather than continuous over wavelength. The paper uses 5-band, 8-band, and 10-band settings, and repeatedly emphasizes that reflectance and lighting are estimated per spectrum rather than absorbed into an unconstrained radiance field. Quantitatively, the method reports strong gains over prior spectral NeRF baselines. On the SpectralNeRF synthetic benchmark, average performance improves from PSNR 33.610 / SSIM 0.9349 / LPIPS 0.0733 to PSNR 38.456 / SSIM 0.9801 / LPIPS 0.0438; on CrossSpectralNeRF multispectral testing, it improves from 33.87 / 0.918 to 35.17 / 0.962; and on the spectral shiny Blender benchmark, it improves from 31.94 PSNR / 0.957 SSIM / 0.068 LPIPS to 34.524 PSNR / 0.969 SSIM / 0.053 LPIPS (Sinha et al., 2024).
An optical counterpart appears in “Dense Dispersed Structured Light for Hyperspectral 3D Imaging of Dynamic Scenes” (Shin et al., 2024). DDSL jointly estimates a depth map and a visible-range hyperspectral image from stereo RGB cameras, an RGB projector, and a diffraction grating film. It reconstructs 23 bands from 440 nm to 660 nm in 10 nm steps using 8 projected DDSL patterns + 1 black pattern, and reports 15.5 nm FWHM spectral resolution, 4 mm depth error, and 6.6 fps operation. The measurement model is linear in the unknown hyperspectral image once geometry is known, and the reconstruction solves a regularized inverse problem with spectral smoothness and spatial total-variation terms. Relative to SpectralGaussians, DDSL is not a 3D neural scene representation, but it shows that spectral 3D estimation can also be driven by coded illumination and calibrated geometry rather than by learned volumetric radiance models.
3. Reciprocal-space-resolved 3D STEM
In STEM, a major branch of spectral 3D methods uses reciprocal-space-resolved measurements rather than scalar detector integrals. “Multislice Electron Tomography using 4D-STEM” introduces MSET, which reconstructs a 3D electrostatic potential directly from a 4D-STEM tilt series using a multislice forward model (Lee et al., 2022). For each tilt, the data are a 4D-STEM dataset 4, and the reconstruction minimizes the mismatch between measured and simulated diffraction patterns over all probe positions and tilt angles. The forward model is the paraxial Schrödinger equation,
5
implemented slice-by-slice through transmission functions and Fresnel propagation. Unlike ADF-STEM tomography, which compresses each scan point to one scalar intensity and relies on a projection approximation, MSET retains the full detector-space signal and explicitly models multiple scattering.
The reported consequences are improved localization, lower reconstruction artifacts from nonlinear contrast, better low-6 sensitivity, and stronger dose efficiency. Under ideal CuAu simulation conditions, MSET achieves RMSD 7 pm, tracing error 8, and classification 9. With reduced scan positions (0) and total tilt-series dose 1, MSET reports RMSD 10.1 pm, tracing error 0.7%, and total classification rate 97.3%, outperforming an ADF-STEM result obtained at approximately 100× more dose (Lee et al., 2022). For BaO, the method is especially significant because it improves oxygen recovery near Ba, a regime where high-angle ADF contrast is intrinsically unfavorable.
A different route to 3D information from reciprocal-space data is the focal-series 2-matrix method for single-orientation 4D-STEM (Brown et al., 2020). Here the experiment records a focal series of 4D-STEM datasets with the sample in a single plan-view orientation, and the reconstruction first estimates the specimen scattering matrix 3 from intensity-only measurements:
4
Depth sectioning then exploits a momentum-dependent parallax shift 5: for a chosen depth 6, the recovered 7-matrix is numerically refocused and de-sheared so that features from that plane align while features at other depths do not. The method was demonstrated on a plan-view 8 heterostructure, where Pb-Ir atomic columns were visible in the uppermost reconstructed layers. It is therefore not tomography in the usual multi-tilt sense, but it is a depth-sensitive reciprocal-space inversion from a single orientation.
A third formulation uses dynamical diffraction itself as the depth encoding mechanism. “3D Strain Field Reconstruction by Inversion of Dynamical Scattering” reconstructs a depth-resolved strain-like field
9
from SCBED-derived 0-plots under systematic-row conditions (Niermann et al., 26 Aug 2025). The forward model is the Darwin–Howie–Whelan equation with a depth-dependent strain perturbation in the excitation error term, and the inverse problem is solved by nonlinear regularized optimization over a coarse 1 grid. A central limitation is an exact intensity symmetry,
2
which yields mirror-depth ambiguities in intensity-only reconstructions. The paper nevertheless demonstrates simulated recovery of buried-strain geometry and an experimental reconstruction of an inclined pseudomorphic 3 layer in a 4 nm GaN lamella. In this line of work, the “spectral” channels are the reciprocal-space coordinates and diffracted-beam responses, not energy-loss channels.
4. Hyperspectral STEM data cubes, sparse acquisition, and noise-aware reconstruction
In spectrum-imaging STEM, the data model is a 3D hyperspectral cube rather than a diffraction-resolved 4D dataset. “Fast reconstruction of atomic-scale STEM-EELS images from sparse sampling” treats a STEM-EELS spectrum-image as a matrix 5, where 6 is the number of energy-loss bands and 7 is the number of spatial pixels (Monier et al., 2020). Spatially sparse acquisition is modeled by
8
and reconstruction solves the convex program
9
where 0 is a band-by-band 2D DCT and the 1 norm promotes joint sparsity across energy channels. The implementation uses PCA preprocessing and FISTA. The paper focuses on 20% random spatial sampling and shows that the proposed CLS method nearly closes the quality gap to 3D dictionary-learning methods while remaining orders of magnitude faster. On 2, CLS reaches SNR 36.18, aSAD 3, SSIM 0.912, and runtime 18.4 s, whereas BPFA reaches 37.02 / 4 / 0.933 but requires 5 s (Monier et al., 2020). The method is therefore explicitly designed for atomic-scale spectral STEM, with the key prior being shared spatial periodicity across EELS bands.
The acquisition side of spectrally resolved STEM is extended by “Spatial and spectral dynamics in STEM hyperspectral imaging using random scan patterns” (Zobelli et al., 2019). This work introduces a hardware-level random scan mode in which the full pixel set is visited exactly once in a fully shuffled order. The point is not merely sparse sampling but decoupling space and time in hyperspectral acquisition. In EELS and nano-CL, the resulting data can be reorganized into time windows, reconstructed by BPFA, and registered by cross-correlation. In a CL example, the average distance between successive illuminated pixels is about 414 nm in random mode versus 8.5 nm in raster mode, reducing dose accumulation effects and charging artifacts. For HAADF/EELS drift correction, the authors segment the acquisition into 10 successive windows, each containing approximately 10% of the total pixels. The method enables atomically resolved elemental maps in EELS and time-resolved analysis of blinking and spectral diffusion in 6-BN nano-CL, including emissions at 3.21 eV, 3.04 eV, 2.15 eV, and 1.73 eV (Zobelli et al., 2019).
A methodological precursor for such pipelines is the implicit 3D STEM reconstruction framework of “Clean Implicit 3D Structure from Noisy 2D STEM Images” (Kniesel et al., 2022). That paper is not itself a spectral STEM method, but it jointly learns a continuous implicit 3D density field, a differentiable STEM forward model, and a signal-dependent conditional normalizing-flow noise model directly from noisy 2D tilt images. The reconstruction objective is maximum-likelihood rather than 7 or 8 fitting to noisy projections, and the reported unsupervised method reaches 2D PSNR 19.93 and 3D PSNR 21.75 on synthetic data (Kniesel et al., 2022). A plausible implication is that the same architecture—implicit 3D field, modality-specific forward model, and learned conditional noise likelihood—could be generalized from scalar STEM to multi-channel spectral STEM, although that extension is not carried out in the paper.
5. Spectral diagnostics, compressed forward models, and adjacent modalities
A recurrent question in spectral 3D pipelines is not only how to reconstruct from many channels, but which spectral properties of intermediate representations matter for 3D quality. “Spectral Probing of Feature Upsamplers in 2D-to-3D Scene Reconstruction” answers this for VFM-based 2D-to-3D systems by introducing six Fourier-domain diagnostics: Structural Spectral Consistency (SSC), Band-wise Spectral Drift (BWG), High-Frequency Spectral Slope Drift (HFSS), Complex Spectral Coherence (CSC), Angular Energy Consistency (ADC), and Mid-band Concentration Stability (MCS) (Xiao et al., 6 Mar 2026). Across CLIP and DINO backbones and several upsamplers, the paper finds that SSC/CSC are the strongest predictors of novel-view synthesis quality, whereas HFSS often correlates negatively. It also distinguishes geometry-sensitive and texture-sensitive spectral properties: ADC correlates more strongly with geometry-related metrics, while SSC/CSC influence texture fidelity slightly more than geometric accuracy. A practically important result is that learnable upsamplers often do not outperform classical interpolation in 3D reconstruction. For example, with DINO+DUSt3R under All, bicubic reaches 24.15 PSNR / 0.8416 SSIM / 0.1577 LPIPS, slightly beating or matching several learnable alternatives (Xiao et al., 6 Mar 2026). In this literature, spectral preservation rather than local sharpness is the operative principle.
A more explicitly physics-based adjacent development appears in “Volumetric Material Decomposition Using Spectral Diffusion Posterior Sampling with a Compressed Polychromatic Forward Model” (Jiang et al., 28 Mar 2025). The setting is spectral CT rather than STEM, but the paper is relevant because it combines a nonlinear spectral forward operator with a learned prior in full 3D. The forward model compresses a 150-bin spectral discretization into 9 learned effective bins while preserving Beer–Lambert physics. The compressed model achieves projection error below 0.01% on the calibration grid and less than 0.1% patient-region error in digital phantom testing. For a 0 volume with two materials, the compressed-model Spectral DPS requires 2680 s and 15.9 GB GPU memory, versus an estimated 4020 s and about 108.2 GB for the full 150-bin model (Jiang et al., 28 Mar 2025). The paper explicitly presents diffusion posterior sampling plus a compressed analytic forward model as transferable to other spectral 3D modalities, while also stating that the Beer–Lambert transmission operator is CT-specific and would need replacement by the correct modality-specific forward model.
These two lines of work—Fourier-domain diagnostics of intermediate representations and compressed physics operators for volumetric inversion—suggest two complementary design rules. One is that spectral structure should be preserved across processing stages; the other is that spectral dimensionality should be reduced only through calibrated operators that preserve the nonlinear dependence of measurements on physical unknowns.
6. Common design patterns, limitations, and open problems
Across the cited literature, explicit forward models recur. SpectralGaussians combines 3DGS rasterization with spectral shading and composited semantic features; DDSL uses a calibrated dispersion-aware image-formation model; MSET uses multislice propagation; the 3D strain method uses Darwin–Howie–Whelan dynamics; and Spectral DPS uses a nonlinear spectral Beer–Lambert operator (Sinha et al., 2024, Shin et al., 2024, Lee et al., 2022, Niermann et al., 26 Aug 2025, Jiang et al., 28 Mar 2025). Even when the representation is learned, the successful systems are rarely pure black boxes: they are structured by physics, calibration, or explicit channel semantics.
Another common pattern is that “more detail” does not automatically imply better 3D inference. The upsampler study shows that high-frequency enhancement can correlate negatively with reconstruction when it perturbs spectral structure, with HFSS serving as a specific warning sign (Xiao et al., 6 Mar 2026). SpectralGaussians improves PSNR and SSIM consistently, yet on some benchmarks LPIPS does not improve, including the SpectralNeRF real benchmark and the spectral synthetic NeRF benchmark (Sinha et al., 2024). In 4D-STEM strain inversion, depth information becomes available through dynamical diffraction, but intensity-only inversion remains fundamentally non-unique under the mirror symmetry 1 (Niermann et al., 26 Aug 2025). These results collectively argue against the simplistic view that richer channels or sharper intermediate maps are sufficient conditions for accurate 3D recovery.
The principal limitations are also consistent. Co-registration is a strong assumption in SpectralGaussians, which explicitly requires registered spectral maps and identifies registration as a limitation (Sinha et al., 2024). DDSL is currently limited to 6.6 fps by software synchronization and to a restricted scene range by diffraction efficiency (Shin et al., 2024). MSET is demonstrated in simulation and is computationally intensive, while single-projection 2-matrix sectioning is depth-sensitive but not a replacement for full angular tomography (Lee et al., 2022, Brown et al., 2020). Spectral DPS uses a slice-wise 2D diffusion prior plus longitudinal TV rather than a true 3D learned prior (Jiang et al., 28 Mar 2025). The random-scan hyperspectral STEM workflow depends on specialized hardware and reconstruction of sparse intermediate frames (Zobelli et al., 2019).
Open problems follow directly from these constraints. The multispectral scene-rendering literature identifies end-to-end spectral registration and efficient encoding of many spectra as unresolved (Sinha et al., 2024). The 4D-STEM strain literature points to phase-sensitive routes such as ptychography and holographic tilt series to break intensity-only symmetry and to multiple specimen orientations for full strain-tensor recovery (Niermann et al., 26 Aug 2025). The upsampler-diagnostics paper calls for training objectives that directly regularize SSC, CSC, HFSS, and ADC rather than merely sharpening dense features (Xiao et al., 6 Mar 2026). The spectral CT work highlights faster reverse-SDE solvers, better initialization, and inclusion of additional nuisance physics (Jiang et al., 28 Mar 2025). Taken together, these directions indicate that spectral 3D methods are converging toward a common research program: explicit channel-aware measurement models, compressed but faithful spectral operators, geometry-consistent priors, and reconstruction objectives that preserve spectral organization rather than only local appearance.