Scalable Residual Feature Aggregation (SRFA)
- SRFA is a scalable framework that combines advanced image preprocessing, residual feature aggregation, and multi-attention segmentation for precise early pancreatic neoplasm detection.
- It employs a novel integration of DenseNet, Vision Transformer, and EfficientNet-B3 alongside hybrid metaheuristic optimization to improve classification robustness and efficiency.
- SRFA demonstrates up to 96% accuracy with significant gains over traditional methods, highlighting its potential for near real-time clinical deployment in multimodal CT imaging.
Scalable Residual Feature Aggregation (SRFA) is a comprehensive and modular framework designed to address the complexities of early pancreatic neoplasm detection from multimodal contrast-enhanced computed tomography (CT) imaging. SRFA synthesizes advanced image preprocessing, segmentation, hierarchical feature aggregation, hybrid metaheuristic feature selection, and dual-optimized classification to enhance detection accuracy, generalization, and computational efficiency. The framework is structured explicitly to emphasize salience in subtle visual cues and maintain robustness across heterogeneous medical data modalities (Thiruvengadam et al., 29 Dec 2025).
1. Pipeline Architecture and Stages
The SRFA methodology encompasses five sequential processing stages:
- Preprocessing: Contrast Limited Adaptive Histogram Equalization (CLAHE), Gaussian smoothing, median filtering, and min–max normalization rescale images to to enhance subtle boundary cues and suppress noise.
- Segmentation (MAGRes-UNet): Employs a U-Net backbone augmented with residual blocks and multi-attention gates on skip connections. Deep supervision integrates high- and low-level features to make fine pancreatic structures more distinguishable.
- Feature Extraction (DenseNet-121 + Residual Feature Stores, "RFS"): DenseNet-121 captures hierarchical representations. The RFS mechanism persistently aggregates feature maps from each dense block using the recursive rule
with modulating contributions and ultimately flattened or pooled for downstream tasks.
- Hybrid Feature Selection (HHO–BA): Harris Hawks Optimization (HHO) facilitates exploration/exploitation while the Bat Algorithm (BA) refines locally. The objective is:
where is cross-validated accuracy, and penalizes feature cardinality.
- Classification (ViT + EfficientNet-B3, SSA–GWO Hyperparameter Tuning): Integrates Vision Transformer (ViT) for global context and EfficientNet-B3 for local detail. Fusion is achieved by concatenation and dense layers. Sparrow Search Algorithm (SSA) supplies initial global hyperparameter optimization, then Grey Wolf Optimizer (GWO) converges to optimal hyperparameters.
2. Residual Feature Aggregation Principle
Residual Feature Stores (RFS) underpin SRFA’s ability to aggregate deep network features without loss of discriminative information or propagation of gradient vanishing. For each dense block, output tensor is integrated recursively:
where are learnable or fixed scalars. The final residual tensor is either flattened or pooled to produce . This scheme enables cumulative feature richness across network depth, which is critical for capturing anatomical variation and subtle pathology signatures.
3. MAGRes-UNet Segmentation Model
MAGRes-UNet extends the canonical U-Net architecture by introducing residual connections and multi-attention gates on skip-connections. The encoder comprises two convolution–batch normalization–ReLU blocks with residual addition:
Downsampling is performed via max pooling. Attention gates operate on encoder skip features and decoder signals :
Decoder blocks upsample, concatenate with attended skip features, and apply residual connections. Segmentation is finalized by convolution and sigmoid activation.
4. Hybrid Metaheuristic Feature Selection
The dual-phase approach leverages HHO for global search and BA for exploitative refinement. Position and velocity update protocols for HHO and BA are run in tandem, balancing exploration and local optima convergence. The fitness objective combines cross-validated classification accuracy and a feature number penalty. HHO updates candidate subsets by mimicking predatory flight, while BA adjusts frequencies and velocities based on global best performance. This joint strategy optimizes feature subset selection for maximal discrimination with minimal dimensionality.
5. Classification and Dual Metaheuristic Hyperparameter Optimization
The classification head fuses global ViT representations and EfficientNet-B3 local descriptors:
- ViT branch utilizes self-attention to model context and semantics.
- EfficientNet-B3 branch extracts spatially rich, locally variant features.
Features are concatenated and passed through fully-connected layers. Hyperparameter search is orchestrated in two phases:
- SSA explores hyperparameter space, guided by ranked “sparrow” candidates.
- GWO subsequently drives convergence by emulation of pack-hunting behavior, refining towards optimal validation accuracy.
6. Experimental Results and Comparative Evaluation
Evaluation on multimodal contrast-enhanced CT slices with organ-balanced train/test split demonstrates substantial performance gains. Metrics include accuracy, precision, sensitivity (recall), specificity, and F1-score. Comparative results are summarized:
| Model | Accuracy (%) | F1 (%) | Specificity (%) |
|---|---|---|---|
| Vision Transformer (ViT) | 83.87 | 82.10 | 83.59 |
| EfficientNet-B3 | 89.96 | 89.20 | 89.96 |
| DenseNet-121 | 80.88 | 79.70 | 79.15 |
| Xception, ResNet-50, etc. | 74–79 | 74–79 | 74–79 |
| SRFA | 96.23 | 95.58 | 94.83 |
SRFA achieves approximately 6–12 percentage points higher accuracy and F1 than the best baseline, attributing improvements to residual aggregation, attention fusion, and metaheuristic-optimized feature selection and classification (Thiruvengadam et al., 29 Dec 2025).
7. Scalability and Computational Analysis
SRFA is architected for modularity and scalability:
- Parameter count: MAGRes-UNet (9M), DenseNet-121 + RFS (8M), ViT + EfficientNet-B3 (100M); total 117M.
- Computational complexity: segmentation per layer; DenseNet blocks ; transformer self-attention per head.
- Runtime: preprocessing (12 ms), segmentation (45 ms), feature extraction (30 ms), classification (60 ms); total per 2D slice (150 ms) on Tesla V100 GPU.
- Modular blocks enable substitution with lighter or deeper variants; RFS supports deeper architectures without gradient degradation; metaheuristic hyperparameter optimization accelerates development.
A plausible implication is that SRFA’s structural flexibility and computational efficiency position it as a candidate for near real-time clinical deployment in advanced medical imaging pipelines, subject to further validation for other neoplastic and anatomical contexts.