Solar Prediction Map Classification

Updated 11 November 2025

Solar prediction map classification is a suite of computational methods that label solar images using curated datasets and multi-modal observations to forecast solar events like flares and coronal holes.
Key approaches involve extracting physics-based, geometric, and topological features combined with classical and deep learning architectures to capture spatiotemporal solar patterns.
Evaluation frameworks employ metrics such as TSS and F1 scores, ensuring robust, ensemble-based performance across diverse active regions and full solar cycles.

Solar prediction map classification refers to the suite of computational methods for assigning scientifically meaningful labels to solar images or derived maps, with the goal of supporting quantitative predictions of solar activity (e.g., flares, coronal holes, or related phenomena) and downstream space weather forecasting. These methodologies integrate solar imaging (magnetograms, EUV, H-alpha), machine learning, statistical evaluation, and domain-specific segmentation and matching techniques to extract and classify spatiotemporal patterns that correlate with eruptive events or solar wind sources.

1. Dataset Foundations and Labeling Protocols

State-of-the-art solar prediction map classification relies on large, rigorously curated datasets combining spaceborne (e.g., SDO/HMI, GOES/SUVI) and ground-based (e.g., GONG) observations. For flare prediction, magnetogram image sets are typically constructed as AR-centric cutouts, with consistent registration and pixel dimensions (e.g., 600×600 px full-res FITS, reduced to 224×224 px image tiles). Temporal coverage spans over a full solar cycle (e.g., 2010–2018), with labeling of each map/timepoint derived from GOES event logs using configurable association windows (e.g., Δt = 24 h before peak, minimum threshold C1.0) and hierarchical class encoding tied to physical flare intensity (e.g., “0”, “C1.0”, “M2.5”, “X1.0”) (Boucheron et al., 2023).

Coronal hole segmentation and map validation rely on multi-modal EUV observations, co-registered with photospheric magnetic field maps and human consensus for annotation. Labeling uncertainty from inter-expert disagreement is explicitly identified: initial Fleiss’s κ statistics indicate high nominal agreement when quiet Sun dominates, but much lower κ (≈0.38) for minority features such as filaments or prominences (reflecting intrinsic ambiguity in boundary placement and feature definition) (Hughes et al., 2019).

Standard splits for train/validation/test enforce segregation by active region (unique AR numbers) to avoid data leakage and to reflect operational requirements—ensuring generalization across solar disk passage or across solar cycles (Boucheron et al., 2023, Bobra et al., 2014, Chen et al., 2019).

2. Map Classification Tasks and Evaluation Frameworks

Solar prediction map classification tasks fall into several categories:

Binary flare prediction (flare/no-flare): Assigns a dichotomous label to each AR magnetogram or time series window, based on occurrence (or forecast) of a major flare (e.g., M1.0+, C1.0+) within a pre-defined window (Boucheron et al., 2023, Bobra et al., 2014, Chen et al., 2019).
Multi-class flare classification: Predicts among several discrete GOES classes (B, C, M, X) or on a finer intensity scale, typically using a softmax output or regression on continuous flux (Boucheron et al., 2023, Chen et al., 2019).
Coronal hole/structural segmentation and map agreement: Assigns a region label to each pixel, focusing on mapping coronal holes or other solar features. Agreement between observed and physical (model-derived) maps is scored for operational model selection (Jatla et al., 2022, Jatla, 2022).
Map-level “good/bad” classification: Used to select the optimal PFSS/WSA solar wind model boundary based on spatial agreement with observed segmentation (e.g., using Random Forest on area/number difference features) (Jatla et al., 2022).

Performance evaluation is anchored on class-imbalance-robust metrics:

$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$

$\text{Precision} = \frac{TP}{TP + FP}$

$\text{Recall} = \frac{TP}{TP + FN}$

$F_1 = 2 \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$

$\text{TSS} = \frac{TP}{TP+FN} - \frac{FP}{FP+TN}$

With additional domain-tailored statistics such as Heidke Skill Score (HSS) and confusion matrix analyses by class (Boucheron et al., 2023, Hughes et al., 2019, Bobra et al., 2014).

3. Feature Extraction: Physics, Geometry, and Topology

Magnetogram image classification traditionally relies on scalar physical features known to correlate with flare activity. These include, for each AR patch:

Total unsigned magnetic flux ( $\Phi = \sum |B_r|\,dA$ )
Vertical current density, current helicity, and related Lorentz force proxies
Gradients, nonpotentiality, and polarity inversion line (PIL) metrics
Physically motivated global and local descriptors (see Table 3 in (Bobra et al., 2014))

Recent research demonstrates significant added skill by incorporating geometric and topological descriptors:

Geometry-based features: Cluster counts, maximum cluster areas and fluxes, minimum centroid distances, and interaction factors $\mathrm{IF}_{ij} = \frac{\Phi_i^+\Phi_j^-}{d_{ij}^2}$ across polarity-separated masks (Deshmukh et al., 2020).
Topology-based features: Betti numbers ( $\beta_1$ = number of holes) computed on cluster filtrations as a function of threshold, providing a multiscale quantification of spatial complexity. These are extracted via cubical complex construction and persistence diagrams over binary masks thresholded at successively higher field strengths (Deshmukh et al., 2020).
Derived image/autoencoder features: Latent representations from deep convolutional architectures (e.g., 10-layer spatial autoencoders), followed by statistical feature selection (marginal two-sample $t$ -tests), support both the capture of high-dimensional spatial patterns and a reduction to tractable feature sets for time-series LSTM classifiers (Chen et al., 2019).

A hierarchy emerges: classical physics-based features capture AR global energetics and nonpotentiality, geometry/topology features encode organization and spatial structure, while autoencoder-derived features exploit patterns inaccessible to human-crafted scalars. Combined feature sets (e.g., SHARPs + topology) yield the highest discriminatory power (TSS improvement of up to $+0.05\pm0.03$ over SHARPs-only baselines; $p<0.01$ ) (Deshmukh et al., 2020).

4. Classification Architectures: Classical, Deep, and Hybrid

Multiple ML frameworks have been validated for solar prediction map classification:

Support Vector Machines (SVMs): Applied on feature vectors derived from magnetogram patches, with RBF (Gaussian) or linear kernels and class-dependent slack penalties. Best results are achieved with univariate feature selection (top 13 or even just the most discriminative 4 features), yielding TSS up to $0.761\pm0.039$ operationally and up to $0.817\pm0.034$ with stricter negative selection (Bobra et al., 2014).
Random Forests (RFs): Used both for pixel-level theme classification (spectral, pixel-wise theme assignment from binned EUV/H-alpha) and for map-level “good/bad” assignment based on difference vectors in area/cluster counts. RFs outperform Gaussian and GMM classifiers in per-class $F_1$ score and temporal stability (Hughes et al., 2019, Jatla et al., 2022).
Deep Convolutional Networks: Transfer-learned VGG16 (pretrained on ImageNet, with final FC layer modified for binary/multiclass softmax output) achieves skill on par with SVMs; e.g., TSS scores of $0.5094$ (full-res) and $0.5325$ (reduced) on HMI cutouts (Boucheron et al., 2023).
Sequence Models (LSTM): Time-series of magnetograms are encoded via SHARP or autoencoder features and classified using two-layer LSTMs. These methods capture both spatial and temporal evolution, delivering $F_1\approx0.9$ and $TSS\approx 0.80$ for strong (M/X) versus weak event prediction up to 24 h in advance (Chen et al., 2019).
Segmentation and Matching: Level-set methods (e.g., Distance Regularized Level Set Evolution, DRLSE) produce binary masks for coronal holes, with cluster matching via Mahalanobis distance and bipartite linear assignment. Automated classification systems combine PCA, KNN, and SVM for model map selection, achieving overall map classification accuracies of 72.9–75.4% (KNN/SVM) and outperforming individual humans (Jatla, 2022). Ensemble models combining random forest and deep networks are used for object- and map-level classification in operational map selection pipelines (Jatla et al., 2022).

5. Segmentation, Cluster Matching, and Physical Model Selection

Segmentation algorithms for coronal hole detection are centered on level-set evolution regularized by both image gradients (EUV) and magnetic neutral-line boundaries. Multi-modal seed maps—blending classical intensity/magnetism (Henney–Harvey), deep encoder-decoder nets (SegNet, FCN/VGG-16), and random forest blob rejection—serve as the initialization for signed-distance PDE evolution, which locks onto true physical boundaries without crossing polarity-inversion lines (Jatla et al., 2022).

After segmentation, coronal hole clusters are grouped via centroid proximity, and new/missing clusters are detected using a feature vector $f=[d,\Delta A]^T$ with Mahalanobis-distance thresholding against a training set manually matched by experts. Remaining clusters are matched by minimum great-circle centroid distance using a bipartite optimal assignment solved as a linear program.

Map-level classification relies on features measuring number and area of new/missing clusters, area over/under estimation, and pixel-level overlap. Random forests trained with six such metrics achieve 95.5% accuracy in classifying “good” vs. “bad” physical model maps for input into operational solar wind predictions, substantially exceeding the accuracy of individual human raters (Jatla et al., 2022). The final candidate is selected by maximizing $A_{\rm ovp} - A_{\rm miss}$ , ensuring maximal spatial agreement with observed segmentation masks.

6. Limitations, Recommendations, and Future Directions

Critical limitations include:

Sensitivity to class imbalance; the prevalence of non-flaring examples requires class-weighting or sampling corrections.
Spectral/pixel-wise classifiers cannot distinguish spatially ambiguous solar features (e.g., filaments vs. quiet Sun) without explicit shape priors (Hughes et al., 2019).
Inter-expert annotation variance, especially at boundaries of diffuse structures, constrains ground-truth reliability and thus model upper performance (Hughes et al., 2019).
For segmentation-driven map classification, there is residual error in cluster matching and new/missing detection, though current methods achieve ≳87.7% accuracy against consensus (Jatla, 2022).

Best practices and forward-looking strategies involve:

Augmenting physics-based features with geometry and topology for improved discriminatory skill (Deshmukh et al., 2020).
Incorporating temporal evolution, e.g., stacking time-resolved cutouts or using 3D CNNs/LSTM hybrids (Boucheron et al., 2023).
Implementing spatial priors via region-based or CNN-driven segmentation, and leveraging crowd-sourced labeling for larger, more diverse training sets (Hughes et al., 2019).
Exploring attention mechanisms, multiwavelength data fusion, and online/continually adapting ensembles for operational robustness.
Systematic validation on AR-partitioned and solar-cycle segmented test sets to assure model generality in operational deployment.

7. Quantitative Benchmarks and Validation Results

The following table summarizes reported predictive skill using key algorithms and datasets (metrics: TSS and $F_1$ , with additional notes):

Study	Algorithm/Feature Set	TSS	$F_1$	Classification Task
(Boucheron et al., 2023)	SVM (features), VGG16 (images)	0.53	--	Flare vs. non-flare (C1+, 24 h)
(Bobra et al., 2014)	SVM (top 13 SHARP features)	0.76–0.82	0.15–0.42	M/X flare (24h operational)
(Chen et al., 2019)	LSTM on SHARP/autoencode features	0.80	0.90	M/X vs. quiet (24h)
(Deshmukh et al., 2020)	NN: SHARPs + topology	0.73	0.17	M1+ flare (24 h window)
(Jatla, 2022, Jatla et al., 2022)	Level-set + cluster + RF	0.75–0.96*	--	Coronal hole map “good/bad”

*Note: For segmentation-driven map selection, quoted numbers are overall classification accuracy.

Reported results indicate that (a) simpler geometric/topological descriptors supplement physics-based features, (b) deep transfer learning and standard CNNs yield comparable results to classical ML when provided sufficient preprocessing/intensity normalization, and (c) automated segmentation and cluster-matching systems outperform individual humans in coronal hole-derived map validation.

A plausible implication is that future operational pipelines will increasingly integrate automated segmentation, flexible feature fusion (physics, geometry, topology), and robust, adaptive ML models to deliver both regional event forecasts and global map-based physical model selection.