Radiomics Feature Extractor
- Radiomics Feature Extractor is a computational system that maps segmented medical images to high-dimensional quantitative feature vectors, summarizing morphology, intensity, and texture.
- It integrates hand-crafted, deep learning, and hybrid pipelines to standardize feature extraction and boost reproducibility across diverse imaging cohorts.
- Advanced methods such as wavelet filtering, GPU acceleration, and tensor radiomics enhance predictive power and address challenges like dimensionality and preprocessing sensitivity.
A radiomics feature extractor is a computational system that maps medical images, typically with segmented regions-of-interest (ROI), to high-dimensional quantitative feature vectors that encode morphological, histogram-based, and spatial texture biomarkers. The extractor operationalizes radiomics as an algorithmic bridge between complex medical imaging data and machine learning–ready tabular representations, with the goal of supporting clinical prediction, diagnosis, prognosis, retrieval, or biomarker discovery in large, heterogeneous imaging cohorts (Afshar et al., 2018). Feature extractors may be realized as modular hand-crafted pipelines, deep learning–driven models, or hybrid systems, and are foundational in standardized, reproducible imaging research and precision medicine.
1. Hand-Crafted Feature Extraction: Categories and Mathematical Formalism
Hand-crafted feature extraction pipelines operate on intensity-resolved medical images and well-defined ROIs (2D or 3D masks). Standardized systems such as PyRadiomics, SERA, and PySERA compute a battery of features, with formulae and implementation parameters harmonized by the Image Biomarker Standardization Initiative (IBSI) (Salmanpour et al., 20 Nov 2025, Primakov et al., 2022).
Principal feature groups:
- First-order statistics: Mean, variance, skewness, kurtosis, energy, entropy, uniformity, min/max, percentiles, RMS, interquartile ranges. These summarize the intensity histogram inside the ROI, e.g., for intensity vector , , where is the normalized histogram (Salmanpour et al., 20 Nov 2025).
- Shape and morphology: Volume, surface area, compactness, sphericity (), maximum 3D/planar diameters, convexity, elongation (Salmanpour et al., 20 Nov 2025, Na et al., 11 Jul 2025).
- Second-order texture features:
- GLCM (Gray Level Co-occurrence Matrix): is the frequency of intensities , at a defined distance and direction, leading to contrast, energy (ASM), homogeneity, correlation, entropy (Vallières et al., 2017, Feng et al., 15 Oct 2025, Primakov et al., 2022).
- GLRLM (Gray Level Run-Length Matrix): counts the number of runs of length at gray-level , yielding short/long run emphasis, non-uniformity measures.
- GLSZM (Gray Level Size Zone Matrix): is the count of connected zones of size at level , leading to small/large zone emphasis, zone non-uniformity (Salmanpour et al., 20 Nov 2025).
- NGTDM (Neighborhood Gray Tone Difference Matrix) and GLDM (Gray Level Dependence Matrix): compute local contrast, coarseness, and dependence statistics.
- Higher-order and filtered features:
- Wavelet subbands (e.g. Haar, Daubechies) and Laplacian-of-Gaussian (LoG) filter responses are convolved with the input; first- and second-order features are recomputed for each sub-band (Depeursinge et al., 2020, Primakov et al., 2022).
- Moment invariants: Based on Hu moments and central moments (e.g., ), providing affine/rotation-invariant shape descriptors (Salmanpour et al., 20 Nov 2025).
Parameter sweeps (voxel size, discretization, interpolation) are critical; tensorized multi-flavour approaches stack feature sets across multiple parameterizations to boost robustness and predictive power (Rahmim et al., 2022).
2. Preprocessing and Standardization
Successful feature extraction requires a reproducible preprocessing pipeline that harmonizes data across sites, scanners, and protocols (Kozák, 23 Sep 2024, Primakov et al., 2022). Critical stages include:
- Resampling to isotropic voxels (typically 1 mm³ by trilinear or B-spline interpolation).
- Bias-field correction (e.g., N4).
- Intensity normalization: Z-score or min-max scaling within the ROI or brain/tissue mask.
- Intensity discretization: Fixed bin width (e.g., 25 HU for CT) or fixed bin count; affects downstream texture matrices (Salmanpour et al., 20 Nov 2025, Primakov et al., 2022).
- Histogram matching and outlier removal: Standardizes intensity distributions across patients.
- ROI extraction: Masked volumetric subsetting; in segmentation-point pipelines (e.g., RadiomicsRetrieval), “point prompt” segmentation is deployed and cropped (Na et al., 11 Jul 2025).
All preprocessing parameters must be logged for auditability (sample spacing, discretization, normalization) to comply with IBSI guidelines (Salmanpour et al., 20 Nov 2025, Depeursinge et al., 2020).
3. Deep Radiomics and Learned Feature Extractors
Beyond hand-crafted descriptors, deep radiomic feature extractors operationalize data-driven feature learning:
- Discovery radiomics with StochasticNet radiomic sequencers: Randomly sparse CNNs (three convolutional layers, kernels, connection probability) trained on preprocessed patches (e.g., from CT), followed by feature extraction as global average–pooled activations (64D per lesion) (Shafiee et al., 2015).
- Hybrid models: Combine hand-crafted and deep features via vector concatenation or ensemble methods; often outperform pure approaches (Afshar et al., 2018).
- Radiomics Incorporation into DNNs: Radiomic feature maps (RFMs) computed locally via sliding kernel (e.g., ), followed by principal component dimension reduction; the reduced RFM volumes are provided as channels to U-Net architectures for segmentation and prediction (Chen et al., 2023).
These networks require rigorous normalization and fixed scaling, and their output descriptors can be appended to or replace hand-crafted feature vectors.
4. Extensions: Multiparametric, Spherical, Dynamic, and Tensor Radiomics
- Multiparametric radiomics (MPRAD): Treats each voxel as an N-dimensional “tissue signature”—the vector of quantized intensities across modalities (Parekh et al., 2018). Extracted feature classes include joint entropy/uniformity (TSPM), spatial co-occurrence (TSCM), tissue-signature networks (TSRM/TSCIN), and nonlinear manifold embedding (Isomap) for downstream classification. This architecture enables true integration of multi-sequence data with demonstrated AUC improvement (e.g., breast mpMRI, AUC up to 0.87).
- Spherical radiomics: Features are computed on concentric shells around a tumor centroid (radial bins), unwrapped onto 2D grids, and processed using standard PyRadiomics; analysis of radial transition slopes between zones is predictive of molecular status and survival in GBM (Feng et al., 15 Oct 2025).
- Dynamic radiomics: Extracts time-evolution of features from longitudinal imaging. For time points, the pipeline produces matrices (static features per time); modeling approaches include discrete pairwise changes, integrated summary statistics, and parametric curve fitting of feature trajectories. Dynamic features (e.g., relative change ratios, global trend statistics) outperform static radiomics in cancer therapy response and mutation prediction (Che et al., 2020).
- Tensor radiomics (TR): Systematic stacking of features computed under multiple parameter flavours (bin sizes, segmentations, filters, modalities); downstream ML/DL can select robust, predictive feature sets via end-to-end architectures (TR-Net) or ensemble selection. TR achieves significant improvement in balanced accuracy and ICC (reproducibility) across multiple tasks (Rahmim et al., 2022).
5. Computational Acceleration and Software Implementations
- PyRadiomics and PyRadiomics-cuda: PyRadiomics provides a widely adopted, IBSI-compliant Python interface; PyRadiomics-cuda offloads computational bottlenecks (mesh extraction, shape features) to GPU via optimized CUDA kernels, yielding order-of-magnitude speedups in shape feature extraction for large volumes (e.g., 2000 acceleration for large ROIs on modern GPUs) (Lisowski et al., 3 Oct 2025).
- PySERA: Implements 557 features (487 IBSI-compliant, 10 moment invariants, 60 diagnostics), with standardized preprocessing, parallel execution, and seamless integration with scikit-learn, PyTorch, TensorFlow, and MONAI. PySERA demonstrates 94% IBSI agreement and improved generalization versus PyRadiomics (Salmanpour et al., 20 Nov 2025).
- RadiomicsRetrieval: A retrieval engine combining classic radiomics (14 shape, 18 first-order, 40 texture features via PyRadiomics) with promptable segmentation and anatomical position embeddings; radiomics-path features are aligned with deep embeddings for flexible, anatomy-aware retrieval (Na et al., 11 Jul 2025).
- Precision-medicine-toolbox: Wraps PyRadiomics with robust curation, conversion, and EDA capabilities, maintaining full traceability and supporting parameterized YAML configurations (Primakov et al., 2022).
All toolkits emphasize auditability, precise parameter control, and compatibility with multi-core/GPU execution.
6. Quality Control, Reproducibility, and Reporting
- IBSI standardization: Enforces agreement on all feature implementations, reference phantoms, reproducibility testing, and reporting templates (Salmanpour et al., 20 Nov 2025, Depeursinge et al., 2020).
- Robustness analysis: Links between discretization, interpolation, ROI perturbation, and feature reproducibility are quantitatively assessed (e.g., via ICC testing, multi-center data sweeps, bootstrap/ensemble selection) (Rahmim et al., 2022, Vallières et al., 2017).
- Comprehensive logging: All preprocessing, parameter choices, and software versions are tracked and validated for repeatability and multicenter harmonization (Primakov et al., 2022, Salmanpour et al., 20 Nov 2025).
Best practices dictate fixed YAML/dict parameterization, inclusion of IBSI reference tables, and documentation of all pipeline steps, including uncertainties and limits in robustness.
7. Limitations and Future Directions
Despite rapid advances, several challenges persist:
- Curse of dimensionality: Feature vectors easily reach – dimensions; robust feature selection via LASSO, RFECV, ensemble methods, or deep end-to-end selection (TR-Net, Isomap embeddings) is imperative (Kozák, 23 Sep 2024, Parekh et al., 2018, Rahmim et al., 2022).
- High memory requirements: Multiparametric and tensor approaches (e.g., TSPM, spherical shell stack) can overwhelm GPU or CPU RAM for high-resolution, high- pipelines (Parekh et al., 2018, Salmanpour et al., 20 Nov 2025).
- Sensitivity to preprocessing: Choice of discretization, voxel size, normalization, and segmentation influences texture feature reproducibility (Depeursinge et al., 2020, Rahmim et al., 2022, Salmanpour et al., 20 Nov 2025). Full reporting and compliance with IBSI and equivalent standards is non-optional.
- Biological interpretability and validation: Machine-selected or deep-learned features may lack obvious biological correspondence, motivating hybrid models and explainable AI integration (Afshar et al., 2018, Feng et al., 15 Oct 2025).
- Expansion to 3D/4D, multi-modal, and functional imaging: Extending existing frameworks to dynamic, spherical, multi-contrast, and temporal imaging is underway, with ongoing validation work (Che et al., 2020, Feng et al., 15 Oct 2025, Parekh et al., 2018).
Radiomics feature extractors thus remain a vibrant, evolving technical domain anchoring the quantitative translation of imaging into clinical and research-centric models. Ongoing advances in mathematical definition, computational scalability, standardization, and integration with deep learning workflows are central to the future of reproducible and clinically relevant imaging biomarkers.