Radiomics-Guided Ensemble

Updated 13 October 2025

Radiomics-guided ensembles are hybrid frameworks that integrate hand-crafted radiomic features with deep learning outputs to improve predictive accuracy in precision medicine.
They employ both feature-level and decision-level fusion techniques, including voting, concatenation, and meta-model stacking, to leverage complementary information.
These ensembles enhance model robustness and interpretability, significantly benefiting applications such as cancer diagnosis, risk stratification, and adaptive radiotherapy.

Radiomics-guided ensembles are hybrid methodological frameworks that systematically integrate radiomic features—quantitative, often hand-crafted descriptors extracted from medical images—with outputs from other computational pipelines (such as deep learning, clinical, or multi-omics data) to enhance predictive, prognostic, or diagnostic modeling in precision medicine. By leveraging the complementary strengths of hand-crafted and learned features, these ensembles routinely surpass the accuracy, robustness, and interpretability achieved by isolated approaches. Their central defining attribute is the use of ensemble strategies—comprising voting, score-level fusion, or feature-level combination—that are explicitly guided by radiomics-derived information.

1. Conceptual Foundations and Rationale

Radiomics refers to the extraction of quantitative features—intensity statistics, regional texture, shape descriptors, and higher-order metrics—from segmented regions of interest (ROIs) in medical images, generating a high-dimensional vector space characterizing each lesion or tissue (Afshar et al., 2018). Traditional radiomics utilizes engineered features (hand-crafted radiomics, HCR), whereas deep learning-based radiomics (DLR or “discovery radiomics”) automatically learns abstract image representations, often without explicit segmentation.

The rationale for radiomics-guided ensemble modeling is grounded in the observation that HCR and DLR yield complementary information. HCR features, while interpretable and closely aligned with domain expertise, may not capture all the high-order dependencies present in complex images and frequently require careful segmentation. DLR features, though potentially more expressive, pose challenges in interpretability and often require large, well-annotated training corpora for generalization. By constructing ensembles that aggregate the strengths of both and, when appropriate, fuse additional data sources (genomic, clinical, multi-modal imaging), one can build models exhibiting both increased stability and superior predictive capacity (Afshar et al., 2018, Tortora et al., 2022, Zhang et al., 2019, Chen et al., 2023).

2. Data Fusion and Ensemble Methodologies

Two principal classes of radiomics-guided ensemble strategies are prevalent: data fusion (feature-level) and output aggregation (decision-level).

A. Feature-Level Fusion:

Feature-level fusion concatenates or otherwise combines radiomics-derived vectors (e.g., intensity histograms, GLCM, shape, etc.) with deep learning feature vectors or features from other modalities prior to classifier training. Dimensionality reduction or statistical feature selection—such as using the χ² test, principal component analysis, or penalized regression—often precedes final modeling to handle collinearity and curse of dimensionality (Afshar et al., 2018, Chen et al., 2023, Peeters et al., 2019). This concatenated hybrid vector is then input directly into machine learning algorithms (SVM, random forests, neural nets, or Cox models).

B. Decision-Level Ensemble and Voting:

Decision-level ensemble learning involves constructing separate classifiers on HCR and DLR features, then combining their probabilistic outputs or discrete predictions using methods such as:

Soft voting: Averaging or weighted averaging of class probabilities from HCR and DLR models. For class label $c$ :

$P(c) = \alpha P_{\mathrm{HCR}}(c) + (1 - \alpha) P_{\mathrm{DLR}}(c),$

where $\alpha \in [0,1]$ can be set or learned (Afshar et al., 2018).

Hard voting: Majority voting among the predicted class labels.
Adaptive voting: Learning example- or stream-specific weights using a secondary classifier, e.g., stacking.
Meta-model fusions: Using risk scores (e.g., from Random Forests trained separately on HCR and DLR) as meta-features for a “stage 2” model, which can be another ensemble (Zhang et al., 2019).

C. Hybrid, Multi-Source Ensembles:

Radiomics-guided ensembles frequently extend to incorporate feature sets from multiple imaging modalities (e.g., PET and CT, DCE and ADC MRI) (Afshar et al., 2018, Kim et al., 5 Jun 2024) or from distinct clinical domains (genomics, clinical records, histopathology). For multimodal or multi-source integration, both feature- and decision-level ensembles are applied, occasionally in hierarchical workflows.

D. Special Voting/Fusion Schemes:

In multimodal paradigms, late fusion strategies—such as product, mean, min/max rules, decision template, Dempster-Shafer combination, and patient-level aggregation—have been implemented to fuse outputs from per-modality classifiers. For example:

Product rule:

$\chi_j = \prod_{i=1}^L \mu_{i, j}$

where $\mu_{i, j}$ is the classifier support from modality $i$ for class $\omega_j$ (Tortora et al., 2022).

Mean rule:

$\chi_j = \frac{1}{L} \sum_{i=1}^L \mu_{i, j}$

These approaches allow flexible inclusion of heterogeneous decision profiles with controlled aggregation.

3. Quantitative Modeling and Performance Assessment

Radiomics-guided ensembles have demonstrated substantial benefit in multiple domains as measured by standard quantitative metrics. Commonly employed metrics include accuracy, AUC, concordance index (C-index), time-dependent AUC (t-AUC), integrated Brier score, hazard ratio (HR), and explained residual variation $R^2$ (Mali et al., 23 May 2025, Peeters et al., 2019, Zhang et al., 2019, Tortora et al., 2022).

Notable findings include:

Decision-level ensembles yield AUC improvements from, for example, 0.65 (PCA-fused) or 0.57 (Boruta) to 0.86 using a risk-score ensemble for survival in pancreatic ductal adenocarcinoma (Zhang et al., 2019).
A multimodal “RadioPathomics” trimodal ensemble, utilizing radiomics, pathomics, and clinical features, achieved AUC = 90.9% compared to unimodal bests of ≈87% (Tortora et al., 2022).
Consensus models that average or require agreement across multiple top ROI ensembles (for example, tumor, lung, mediastinum, coronary artery, and deep features) show markedly higher 5-year t-AUC (0.92), high sensitivity (97.6%), and specificity (66.7%) (Mali et al., 23 May 2025).

Performance gains are often most pronounced in settings where each data stream brings complementary, partially uncorrelated information—e.g., deep features can capture textural or multi-scale patterns missed by hand-crafted features, while radiomics provides interpretable clinical context and robustness to dataset size and domain shift.

4. Interpretability, Stability, and Harmonization

Stability and interpretability are critical in clinical ensemble model deployment. High feature collinearity and variability (from acquisition protocol, scanner, reconstruction parameters) can undermine model reproducibility and validity.

To address these issues:

Feature stabilization: Redundancy filtering and penalized covariance estimation, followed by ML factor analysis, can produce compact meta-features that are nearly orthogonal and robust to multicollinearity (Peeters et al., 2019).
Harmonization: Harmonization strategies, such as ComBat (feature-level) and reconstruction kernel normalization (RKN, image-level), reduce scanner and protocol bias, thereby improving generalization in multicenter studies (Mali et al., 23 May 2025, Selim et al., 2021).
SHAP (SHapley Additive exPlanations) values: These are increasingly used to quantify and visualize the contributions of individual radiomic, deep, or clinical features to the hazard or classification scores, fostering model transparency (Mali et al., 23 May 2025).
Dynamic feature selection: Ensuring that only stable and discriminative radiomics features (e.g., those passing Wilcoxon, PCA, or concordance criteria) enter the ensemble helps prevent overfitting and promotes reproducible biomarker discovery (Flouris et al., 2022).

5. Multi-domain and Clinical Applications

Radiomics-guided ensembles have found application in various medical imaging and decision support contexts, including:

Cancer diagnosis and risk stratification: Integrating HCR, DLR, and clinical features in ensembles has improved discriminative performance for benign vs. malignant tumor classification, survival prediction in NSCLC, and recurrence in head and neck or pancreatic cancers (Afshar et al., 2018, Zhang et al., 2019, Tortora et al., 2022, Mali et al., 23 May 2025).
Adaptive radiotherapy: Tracking delta radiomics—temporal changes in radiomics features during treatment—enables real-time adaptation of dosing regimes in stereotactic MRI-guided radiotherapy by correlating radiomic trajectories with treatment response and outcome (Zha et al., 23 Feb 2024).
Ensemble modeling in segmentation: Ensembles of deep networks incorporating spatially encoded radiomics feature maps have demonstrated higher dice coefficients and reduced uncertainty in glioma segmentation compared to deep learning-only or paired fusion approaches (Chen et al., 2023).
Multimodal image modeling: Frameworks utilizing late fusion to combine radiomics with pathomics and clinical data yield superior prediction of radiotherapy outcomes, indicating the benefit of simultaneous ensemble integration of orthogonal sources (Tortora et al., 2022).
Image retrieval and synthesis: Ensemble approaches that align radiomics-based and deep embeddings facilitate content-based medical image retrieval and radiomics-conditioned tumor synthesis for augmentation, teaching, or planning (Na et al., 11 Jul 2025, Kim et al., 29 Sep 2025).

6. Limitations, Challenges, and Future Perspectives

Despite strong performance, several challenges persist:

Dimensionality and overfitting: High-dimensional concatenation of radiomics and DLR features elevates overfitting risk, necessitating robust feature selection and regularization (Peeters et al., 2019, Zhang et al., 2019).
Data integration complexity and interpretability: Ensemble voting and hierarchically fused models, especially with adaptive or meta-learned weights, can be opaque, posing challenges for clinical trust (Afshar et al., 2018, Zhang et al., 2019).
Stability and harmonization requirements: Models are sensitive to acquisition variability; comprehensive harmonization and systematic stability assessment must precede broad deployment (Mali et al., 23 May 2025, Flouris et al., 2022, Selim et al., 2021).
Limited multimodal datasets for deep and hybrid modeling: Data scarcity—especially for multi-modal or multi-omics ensembles—still limits the generalizability and statistical power of ensemble studies.
Semantic redundancy: While feature-level fusion may boost empirical accuracy, it also risks collapsing to a subset of correlated variables, limiting gains unless careful feature decorrelation and weighting are employed (Peeters et al., 2019).

Emerging trends include more sophisticated meta-ensembles (e.g., stacking, Bayes-optimal weighting), deliberate exploitation of temporal radiomics (delta radiomics in adaptive therapy), and harmonization pipelines spanning image and feature spaces to future-proof predictive modeling against acquisition drift (Mali et al., 23 May 2025, Selim et al., 2021, Zha et al., 23 Feb 2024).

7. Summary Table: Key Ensemble Fusion Strategies

Ensemble Strategy	Fusion Level	Description
Soft/Hard/Adaptive	Decision	Combine predictions from independent HCR and DLR models via weighted or majority voting (Afshar et al., 2018)
Feature Concatenation	Feature	Concatenate HCR and DLR feature vectors, possibly after selection, and input to single classifier
Risk Score Fusion	Decision/meta	Train separate classifiers, use their risk scores as meta-features for final model (Zhang et al., 2019)
Multimodal Late Fusion	Decision/meta	Aggregate per-modality classifier outputs using fusion rules (product, mean, Dempster-Shafer, etc.) (Tortora et al., 2022)
Multi-source Aggregation	Multi-level	Integrate clinical, omics, multi-modality imaging, and radiomics within hierarchical ensemble pipeline

Radiomics-guided ensembles have become central methodological pillars in data-driven medical image analysis, facilitating robust and interpretable integration of domain-informed engineered features, deep representations, and multi-source evidence to advance precision diagnostics, prognosis, and therapy planning across heterogeneous clinical settings.