Spatial Feature Enhancement via Adaptive Fusion

Updated 8 December 2025

The paper presents SFE-GAF, which adaptively fuses spatial features by integrating geometric, semantic, and spectral cues to enhance object delineation.
It employs deformable convolutional kernels and class-aware weighting to optimize segmentation in both medical imaging and remote sensing environments.
Empirical results show improvements in F1 score, sensitivity, and reduced RMSE, validating the efficacy of geometry-adaptive fusion in complex scenarios.

Spatial Feature Enhancement with Geometrically Adaptive Fusion (SFE-GAF) is an advanced methodology for multi-source feature integration that prioritizes the geometric, semantic, and spectral characteristics of spatial data. SFE-GAF has been developed in both deep learning-based segmentation networks—such as DB-KAUNet for medical imaging—and non-parametric filter-driven frameworks for remote sensing applications. Central to SFE-GAF is the notion that adaptive spatial sampling and class-aware weighting can significantly bolster the fidelity and robustness of fused representations, particularly in domains characterized by complex, non-linear geometry (e.g., retinal vasculature, urban DSMs). SFE-GAF strategically aligns sampling regions and fusion weights to salient object morphologies, utilizing deformable convolutional kernels or geometry-and-class-sensitive weight maps.

1. Theoretical Underpinnings and Core Fusion Workflow

SFE-GAF formalizes spatial fusion as an adaptive, weighted synthesis of multiple feature or height maps. For temporally-registered DSMs or analogous gridded feature sets, the enhanced output $\hat{X}(p)$ at location $p$ is given by

$\hat{X}(p) = \sum_{i=1}^N w_i(p) X_i(p)$

where $w_i(p)$ denotes the normalized fusion weight determined by spatial, spectral, geometric, and semantic factors. Weights are constructed as the product of bilateral spatial-spectral proximity,

$w_i^{(0)}(p) = \frac{\exp(-\|\vec{p} - \vec{p}_i\|^2/2\sigma_s^2 - \|I(p) - I(p_i)\|^2/2\sigma_r^2)}{\sum_j \exp(-\|\vec{p} - \vec{p}_j\|^2/2\sigma_s^2 - \|I(p) - I(p_j)\|^2/2\sigma_r^2)}$

multiplied by geometric similarity

$g(\vec{n}(p), \vec{n}(p_i)) = \exp \left( -\frac{\|\vec{n}(p) - \vec{n}(p_i)\|^2}{2\sigma_n^2} \right)$

and class-adaptive height consistency,

$w_n(p) = \exp \left( -\frac{|\tilde{X}(p) - X_i(p)|^2}{2 [\sigma_h^{(\mathrm{class}(p))}]^2} \right )$

where $\sigma_h^{(c)}$ is estimated from per-class statistics. This multiplicative weighting structure underpins SFE-GAF's capacity to respect object boundaries, geometry, and semantic uncertainty (Albanwan, 27 Apr 2024).

2. Geometrically Adaptive Convolution: Deep Learning Implementation

Within deep encoder-decoder architectures, notably the DB-KAUNet for retinal vessel segmentation, SFE-GAF deploys a Linear Deformable Convolution (LDConv) to replace fixed receptive fields. The LDConv utilizes an X-shaped sampling grid $S$ with $N=20$ points structured along image diagonals, formally

$S = \{(i,i)\mid i=0,\dots,9\} \cup \{(i, 9-i)\mid i=0,\dots,9\}$

Adaptive offsets $\Delta p_n$ for each sample enable the receptive field to bend along elongated morphologies. The output at channel $c$ and location $p_0$ is

$Y_c(p_0) = \sum_{n=1}^N w_{c,n} X(p_0 + p_n + \Delta p_n(p_0))$

where bilinear interpolation is used at fractional coordinates. Branch fusion proceeds as

$X_{skip} = \mathrm{LDConv}(\mathrm{PAM}(L_{fuse})) + \mathrm{LDConv}(\mathrm{PAM}(G_{fuse}))$

integrating local (CNN) and global (Transformer) representations. This structure enables vessel-centric adaptation and background suppression in segmentation tasks (Xu et al., 1 Dec 2025).

3. Algorithmic Description and Implementation Details

For non-parametric fusion in remote sensing, SFE-GAF executes a filter in which, for each pixel $p$ , semantic class and local statistics are computed; weights for each feature map $X_i$ are calculated and aggregated:

Inputs: X[1..N](p), n[1..N](p), I(p), cls(p), σ_s, σ_r, σ_n, {σ_h^(c)}
X_med(p) ← median_i{ X[i](p) }
For each pixel p:
  c ← cls(p)
  For i = 1..N:
    w_s ← exp( -||p - p||^2/(2σ_s^2) )
    w_r ← exp( -||I(p)-I(p_i)||^2/(2σ_r^2) )
    w_g ← exp( -||n[1](p)-n[i](p)||^2/(2σ_n^2) )
    w_h ← exp( -|X_med(p)-X[i](p)|^2/(2(σ_h^(c))^2) )
    w ← w_s * w_r * w_g * w_h
    accumulate numerator, denominator
  X_fused(p) ← numerator / denominator
Output: X_fused(p)

Key hyperparameters include

\sigma_s=5

–$7$ px,

\sigma_r=30

–$50$,

\sigma_n=0.1

–$0.3$, and

\sigma_h^{(c)}

empirically set to

0.7\times

(class std height), as found optimal in experiments (Albanwan, 27 Apr 2024).

4. Integration within Neural Architectures and Fusion Pipelines

In DB-KAUNet, SFE-GAF modules are deployed in the third and fifth encoder stages, directly downstream of the Cross-Branch Channel Interaction (CCI) unit. Inputs are fused outputs from parallel CNN/Transformer blocks, processed to unify dimensions and optimize skip-connections. LDConv channel output count aligns with decoder expectations (e.g., 256 or 128). Training employs AdamW optimizer with cosine annealing, loss function $\mathcal{L}=0.5\cdot \mathcal{L}_\mathrm{CE} + 0.5\cdot \mathcal{L}_\mathrm{Dice}$ , and early stopping on F1 score (Xu et al., 1 Dec 2025).

For filter-based pipelines in remote sensing, SFE-GAF operates as a single fusion layer, requiring only band statistics and class assignment for operation. No learnable parameters are involved, but in deep variants weighted-sum fusion may be replaced by a trainable $1\times1$ convolution with appropriate reconstruction, geometry, and semantic consistency losses (Albanwan, 27 Apr 2024).

5. Quantitative Performance and Ablation Analysis

The adoption of geometrically adaptive fusion is empirically validated. In DB-KAUNet ablations on DRIVE, STARE, and CHASE_DB1, substituting standard SFE with SFE-GAF yields marked improvements:

F1 increases from 0.8812 to 0.8964 (+1.52%)
Sensitivity rises from 0.8826 to 0.8985 (+1.59%)
Accuracy from 0.9701 to 0.9739 (+0.38%) This demonstrates enhanced vessel detection and segmentation quality attributable to SFE-GAF's adaptive sampling (Xu et al., 1 Dec 2025).

In DSM fusion benchmarks (Omaha, Jacksonville, Argentina, London), SFE-GAF consistently delivers the lowest RMSE—up to 6 m better than non-adaptive alternatives. Per-class accuracy benefits are most pronounced for vegetation classes, increasing 2–6% over median fusion. Qualitative examples underline improved edge fidelity and suppression of typical "salt-pepper" artifacts (Albanwan, 27 Apr 2024).

6. Comparison with Conventional Fusion Schemes

SFE-GAF supersedes traditional median, adaptive median, K-median, and global weighted average fusion schemes. While nonadaptive methods blur boundaries or fail in multi-modal histograms, SFE-GAF delineates individual objects despite complex class and geometric landscapes. The filter's internal class-wise noise modeling enhances separation, particularly in scenes with mixed building/vegetation structure. Visual assessment corroborates the superiority of SFE-GAF in urban blocks with intricate topography (Albanwan, 27 Apr 2024).

7. Design Rationale and Plausible Implications

An X-shaped grid in LDConv is optimal for capturing morphologies like tortuous vessels—aligning the receptive field to elongated structures and minimizing background sampling. Learned offsets allow dynamic bending along vessel trajectories or object boundaries. The semantic-height weighting confers additional robustness to class-wise variability. This suggests SFE-GAF is well-suited for any spatial fusion task involving thin, directional, or semantically heterogeneous structures. A plausible implication is that similar adaptive fusion paradigms may benefit precise delineation in other biomedical or geospatial modalities characterized by localized geometry and diverse class statistics.

Summary Table: SFE-GAF Key Features and Empirical Gains

Domain	Method	Key Fusion Mechanism	Performance Gain
Retinal Imaging	SFE-GAF in DB-KAUNet (Xu et al., 1 Dec 2025)	X-shaped LDConv + Attention	+1.5% F1, +1.6% SE, +0.4% ACC
Remote Sensing	SFE-GAF Filter (Albanwan, 27 Apr 2024)	Geometry/semantic-weighted sum	1–6 m RMSE reduction; 2–6% class accuracy gain

SFE-GAF represents a technically rigorous, geometry- and class-aware approach to spatial feature fusion, validated in diverse high-precision contexts and offering substantial improvements in edge fidelity and object-specific segmentation accuracy.