GeloVec: Geometric CNN Segmentation Model
- GeloVec is a convolutional neural network framework that uses higher-dimensional geometric smoothing to address boundary instabilities in segmentation.
- It integrates a modified Chebyshev distance, orthogonal basis transformation, and adaptive sampling to stabilize feature extraction and preserve object boundaries.
- Empirical results show mIoU gains up to 2.7% across benchmarks, demonstrating improved precision, generalization, and computational efficiency.
GeloVec is a convolutional neural network–based framework for semantic segmentation designed to address boundary instability and contextual discontinuities inherent in conventional attention-driven methods. By explicitly modeling the feature space as a higher-dimensional manifold and leveraging advanced geometric smoothing techniques, GeloVec achieves stabilized feature extraction, superior boundary preservation, and intra-class homogeneity in visual segmentation tasks. Its architecture combines a modified Chebyshev distance metric, a multispatial (orthogonal basis) transform, and adaptive sampling weights, grounded in Riemannian geometry, while maintaining computational efficiency and robust generalization across datasets (Kriuk et al., 2 May 2025).
1. High-Level Architecture and Design Objectives
GeloVec extends the U-Net–style encoder–decoder paradigm, employing a ResNet-34 backbone. The primary aims are twofold: stabilize attention maps near object boundaries and retain coherent feature representations within homogenous regions. Conventional CNN-based segmentation often suffers from artifacts at boundaries and fails to maintain region consistency when using pixel-wise operators. GeloVec addresses these by casting activations into a higher-dimensional "feature manifold" and exploiting geometric relationships.
The architecture sequentially integrates four principal modules following each encoding stage:
- Orthogonal Basis Transform (OBT): Projects and re-orthogonalizes local descriptors into an expanded basis, enhancing the expressivity of local feature neighborhoods.
- Geometric Adaptive Sampling (GAS): Computes a learnable, Chebyshev-style distance field over the higher-dimensional feature space.
- Edge Preservation Mechanism (EPM): Gates feature mixing based on geometric distances to prevent cross-boundary information bleeding.
- Attention Aggregation: Modulates the standard dot-product attention mechanism using the distance field for improved spatial coherence.
This configuration is applied at four encoding scales: GeloVecLow, GeloVecMid, GeloVecHigh, and GeloVecVeryHigh, before spatial down-sampling. The decoder employs transposed convolutions and refined skip connections to output a binary mask.
2. Geometric Smoothing via Modified Chebyshev Distance
GeloVec's geometric smoothing core is a weighted Chebyshev (ℓ∞) metric in -dimensional feature space. For each center pixel with features and its neighborhood , GeloVec learns per-offset weights . The weighted Chebyshev distance for neighbor is defined as: The aggregation operator computes: where denotes the sigmoid activation. This process yields a robust, locally adaptive distance field, which acts as a smoothness constraint in subsequent processing.
3. Adaptive Sampling Weights and Multispatial Transformation
The sampling weights are initialized uniformly and optimized end-to-end. Each scales the feature channels corresponding to the th spatial offset, with the ℓ∞ norm extracting the maximal, and thus most salient, channel difference—highlighting spatial boundaries.
Prior to distance computation, OBT projects features into an -fold expanded, orthonormal basis using a convolution and channel-wise normalization:
where is reshaped and normalized to ensure orthogonality along the channel axis. In matrix notation, the projection matrix is constructed with orthonormal rows. These orthogonalized vectors yield an expressive, locally discriminative tensor basis for geometric computations.
4. Riemannian Geometry Foundation
The theoretical underpinning of GeloVec incorporates Riemannian geometry, wherein the feature space manifold possesses a metric that governs intrinsic distances. While explicit curvature or geodesic equations are not derived, GeloVec's approach—embedding the Chebyshev-based, adaptive distance field —approximates local geodesic computations: By focusing on the steepest channel-wise difference, the system approximates maximal local metric change, which theoretically stabilizes feature propagation and maintains segmentation fidelity under perturbations. This geometric smoothing is justified as analogous to stability results from the Laplace–Beltrami operator.
5. Parallel Implementation of Geodesic Transformations
Although not accompanied by source code, GeloVec’s processing is structured for efficient parallelization, leveraging standard GPU kernels for convolutions and reductions. The workflow involves the following stages:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Input: X ∈ ℝ^{B×C×H×W} 1. B_proj ← Conv1×1(X) # B×(nC')×H×W 2. B_ortho ← reshape(B_proj,B,n,C',H,W) B_ortho ← B_ortho / ‖B_ortho‖₂₂(channel) 3. For each spatial position p (in parallel): Gather neighbors {p_i} (dilated sample) For each offset i (in parallel): Δ_i ← W_i ⊙ (B_ortho[:,i,:,p] − B_ortho[:,i,:,p_i]) D_i ← max_channel(|Δ_i|) D_max(p) ← max_i D_i D_norm(p) ← sigmoid(Conv1×1(D_max(p))) 4. F_edge ← Conv3×3(B_ortho) # edge features G_edge ← sigmoid(Conv1×1(D_norm)) Y_edge ← B_ortho*(1−G_edge) + F_edge*G_edge 5. Q,K,V ← Conv1×1(Y_edge) triplet A_raw ← softmax((Q·K^T)/(√d_k − λ·D_norm)) Y_out ← A_raw · V Output: Y_out |
6. Experimental Results and Comparative Evaluation
GeloVec has been validated on three benchmark datasets:
- Caltech CUB-200-2011 (CUB-200)
- Large-Scale Dataset for Segmentation and Classification (LSDSC)
- Flood Semantic Segmentation Dataset (FSSD)
Metrics included mean Intersection over Union (mIoU), F1 score, Precision, and Recall. GeloVec demonstrated mIoU increases of:
- +2.1% on CUB-200
- +2.7% on LSDSC
- +2.4% on FSSD
These gains were measured against U-Net, DeepLabV3+, HRNet, and SegFormer (MiT-B1). Precision increased by up to 4–5 points, underscoring improved boundary detection and region coherence.
7. Computational Efficiency, Generalization, and Implementation Considerations
GeloVec maintains efficiency by retaining the ResNet-34 encoder and utilizing GPU-native operations: convolutions, ℓ∞-norms, and max-over-neighbor reductions. All geometric modules are fusable with standard deep learning kernels. The geometry-aware smoothing mechanism is dataset-agnostic, eliminating the need for hand-tuned edge losses and enabling robust transfer across domains such as bird contour detection and flood boundary mapping.
Practical recommendations include:
- Utilizing grouped convolutions in the OBT to control parameter count.
- Precomputing dilated neighbor indices to avoid runtime overhead.
- Tuning the parameter in the attention softmax denominator to interpolate between standard and geometry-modulated attention.
Overall, GeloVec exemplifies a coherent integration of higher-dimensional geometric feature smoothing, orthogonal projection, and attention gating, enabling sharper and more stable segmentation masks without substantial computational overhead (Kriuk et al., 2 May 2025).