Papers
Topics
Authors
Recent
Search
2000 character limit reached

GeloVec: Geometric CNN Segmentation Model

Updated 16 February 2026
  • GeloVec is a convolutional neural network framework that uses higher-dimensional geometric smoothing to address boundary instabilities in segmentation.
  • It integrates a modified Chebyshev distance, orthogonal basis transformation, and adaptive sampling to stabilize feature extraction and preserve object boundaries.
  • Empirical results show mIoU gains up to 2.7% across benchmarks, demonstrating improved precision, generalization, and computational efficiency.

GeloVec is a convolutional neural network–based framework for semantic segmentation designed to address boundary instability and contextual discontinuities inherent in conventional attention-driven methods. By explicitly modeling the feature space as a higher-dimensional manifold and leveraging advanced geometric smoothing techniques, GeloVec achieves stabilized feature extraction, superior boundary preservation, and intra-class homogeneity in visual segmentation tasks. Its architecture combines a modified Chebyshev distance metric, a multispatial (orthogonal basis) transform, and adaptive sampling weights, grounded in Riemannian geometry, while maintaining computational efficiency and robust generalization across datasets (Kriuk et al., 2 May 2025).

1. High-Level Architecture and Design Objectives

GeloVec extends the U-Net–style encoder–decoder paradigm, employing a ResNet-34 backbone. The primary aims are twofold: stabilize attention maps near object boundaries and retain coherent feature representations within homogenous regions. Conventional CNN-based segmentation often suffers from artifacts at boundaries and fails to maintain region consistency when using pixel-wise operators. GeloVec addresses these by casting activations into a higher-dimensional "feature manifold" and exploiting geometric relationships.

The architecture sequentially integrates four principal modules following each encoding stage:

  • Orthogonal Basis Transform (OBT): Projects and re-orthogonalizes local descriptors into an expanded basis, enhancing the expressivity of local feature neighborhoods.
  • Geometric Adaptive Sampling (GAS): Computes a learnable, Chebyshev-style distance field over the higher-dimensional feature space.
  • Edge Preservation Mechanism (EPM): Gates feature mixing based on geometric distances to prevent cross-boundary information bleeding.
  • Attention Aggregation: Modulates the standard dot-product attention mechanism using the distance field for improved spatial coherence.

This configuration is applied at four encoding scales: GeloVecLow, GeloVecMid, GeloVecHigh, and GeloVecVeryHigh, before spatial down-sampling. The decoder employs transposed convolutions and refined skip connections to output a 224×224224 \times 224 binary mask.

2. Geometric Smoothing via Modified Chebyshev Distance

GeloVec's geometric smoothing core is a weighted Chebyshev (ℓ∞) metric in nn-dimensional feature space. For each center pixel pcp_c with features FpcF_{p_c} and its neighborhood N(pc)\mathcal N(p_c), GeloVec learns per-offset weights WiRCW_i \in \mathbb{R}^{C'}. The weighted Chebyshev distance for neighbor pip_i is defined as: D(pc,pi)=maxd=1,,C[Wi(FpcFpi)]dD_{\infty}(p_c,p_i) = \max_{d=1,\dots,C'} \left| [ W_i \odot (F_{p_c} - F_{p_i}) ]_d \right| The aggregation operator computes: Dnorm(pc)=σ(Conv1×1(maxpiN(pc)D(pc,pi)))[0,1]D_{\mathrm{norm}}(p_c) = \sigma\left( \text{Conv}_{1\times1} \left( \max_{p_i \in \mathcal N(p_c)} D_{\infty}(p_c,p_i) \right) \right) \in [0,1] where σ\sigma denotes the sigmoid activation. This process yields a robust, locally adaptive distance field, which acts as a smoothness constraint in subsequent processing.

3. Adaptive Sampling Weights and Multispatial Transformation

The sampling weights WiW_i are initialized uniformly and optimized end-to-end. Each WiW_i scales the feature channels corresponding to the iith spatial offset, with the ℓ∞ norm extracting the maximal, and thus most salient, channel difference—highlighting spatial boundaries.

Prior to distance computation, OBT projects features into an nn-fold expanded, orthonormal basis using a 1×11\times1 convolution and channel-wise 2\ell_2 normalization: XBprojRB×(nC)×H×WX \rightarrow B_\mathrm{proj} \in \mathbb{R}^{B \times (nC') \times H \times W}

Bortho=B^B^2,  channelB_\mathrm{ortho} = \frac{\hat B}{\|\hat B\|_{2,\;\mathrm{channel}}}

where B^\hat B is reshaped and normalized to ensure orthogonality along the channel axis. In matrix notation, the 1×11\times1 projection matrix MRnC×CM \in \mathbb{R}^{nC' \times C} is constructed with orthonormal rows. These orthogonalized vectors yield an expressive, locally discriminative tensor basis for geometric computations.

4. Riemannian Geometry Foundation

The theoretical underpinning of GeloVec incorporates Riemannian geometry, wherein the feature space manifold (M,g)(\mathcal{M}, g) possesses a metric gg that governs intrinsic distances. While explicit curvature or geodesic equations are not derived, GeloVec's approach—embedding the Chebyshev-based, adaptive distance field DnormD_{\mathrm{norm}}—approximates local geodesic computations: dg(x,y)=infγ:[0,1]M,  γ(0)=x,  γ(1)=y01gγ(t)(γ˙(t),γ˙(t))dtd_g(x, y) = \inf_{\gamma: [0,1] \to \mathcal{M},\; \gamma(0) = x,\; \gamma(1) = y} \int_0^1 \sqrt{g_{\gamma(t)}(\dot \gamma(t), \dot \gamma(t))}\,dt By focusing on the steepest channel-wise difference, the system approximates maximal local metric change, which theoretically stabilizes feature propagation and maintains segmentation fidelity under perturbations. This geometric smoothing is justified as analogous to stability results from the Laplace–Beltrami operator.

5. Parallel Implementation of Geodesic Transformations

Although not accompanied by source code, GeloVec’s processing is structured for efficient parallelization, leveraging standard GPU kernels for convolutions and reductions. The workflow involves the following stages:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Input: X  ℝ^{B×C×H×W}
1. B_proj  Conv1×1(X)                        # B×(nC')×H×W
2. B_ortho  reshape(B_proj,B,n,C',H,W)
   B_ortho  B_ortho / B_ortho(channel)
3. For each spatial position p (in parallel):
     Gather neighbors {p_i} (dilated sample)
     For each offset i (in parallel):
         Δ_i  W_i  (B_ortho[:,i,:,p]  B_ortho[:,i,:,p_i])
         D_i  max_channel(|Δ_i|)
     D_max(p)  max_i D_i
     D_norm(p)  sigmoid(Conv1×1(D_max(p)))
4. F_edge  Conv3×3(B_ortho)                  # edge features
   G_edge  sigmoid(Conv1×1(D_norm))
   Y_edge  B_ortho*(1G_edge) + F_edge*G_edge
5. Q,K,V  Conv1×1(Y_edge) triplet
   A_raw  softmax((Q·K^T)/(d_k  λ·D_norm))
   Y_out  A_raw · V
Output: Y_out
All steps are parallelized across batch, channel, and spatial dimensions, and exploit standard convolution, dilation, and matrix multiplication kernels.

6. Experimental Results and Comparative Evaluation

GeloVec has been validated on three benchmark datasets:

  • Caltech CUB-200-2011 (CUB-200)
  • Large-Scale Dataset for Segmentation and Classification (LSDSC)
  • Flood Semantic Segmentation Dataset (FSSD)

Metrics included mean Intersection over Union (mIoU), F1 score, Precision, and Recall. GeloVec demonstrated mIoU increases of:

  • +2.1% on CUB-200
  • +2.7% on LSDSC
  • +2.4% on FSSD

These gains were measured against U-Net, DeepLabV3+, HRNet, and SegFormer (MiT-B1). Precision increased by up to 4–5 points, underscoring improved boundary detection and region coherence.

7. Computational Efficiency, Generalization, and Implementation Considerations

GeloVec maintains efficiency by retaining the ResNet-34 encoder and utilizing GPU-native operations: 1×11\times 1 convolutions, ℓ∞-norms, and max-over-neighbor reductions. All geometric modules are fusable with standard deep learning kernels. The geometry-aware smoothing mechanism is dataset-agnostic, eliminating the need for hand-tuned edge losses and enabling robust transfer across domains such as bird contour detection and flood boundary mapping.

Practical recommendations include:

  • Utilizing grouped 1×11\times1 convolutions in the OBT to control parameter count.
  • Precomputing dilated neighbor indices to avoid runtime overhead.
  • Tuning the λ\lambda parameter in the attention softmax denominator to interpolate between standard and geometry-modulated attention.

Overall, GeloVec exemplifies a coherent integration of higher-dimensional geometric feature smoothing, orthogonal projection, and attention gating, enabling sharper and more stable segmentation masks without substantial computational overhead (Kriuk et al., 2 May 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GeloVec.