Papers
Topics
Authors
Recent
2000 character limit reached

GN-CBLinear: Group Norm for Linear Projection

Updated 17 December 2025
  • The paper demonstrates that GN-CBLinear stabilizes gradient propagation in micro-batch settings by combining group normalization, affine rescaling, and pointwise projection.
  • The module is designed as a lightweight and strictly linear shortcut that maintains reversibility and consistent training performance compared to BatchNorm.
  • Empirical results on remote sensing datasets show modest mAP improvements and enhanced convergence stability in LiM-YOLO architectures.

The Group Normalized Convolutional Block for Linear Projection (GN-CBLinear) is a lightweight, batch-size–agnostic auxiliary normalization module designed for stable, efficient gradient propagation in convolutional neural networks under micro-batch regimes. Integrated into the LiM-YOLO detector for ship detection in optical remote sensing imagery, GN-CBLinear provides a linear, reversible “shortcut” enabling robust gradient flow without relying on mini-batch–dependent statistics, addressing the instability commonly associated with BatchNorm in small-batch settings. GN-CBLinear combines group normalization, affine per-channel scaling, and pointwise convolution to stabilize the auxiliary Programmable Gradient Information (PGI) branches of LiM-YOLO and achieve consistent training performance improvements over conventional approaches (Kim et al., 10 Dec 2025).

1. Architectural Composition

GN-CBLinear operates on input tensors FRn×c×h×wF \in \mathbb{R}^{n \times c \times h \times w}, with nn as batch size, cc channels, and h×wh \times w spatial resolution. Its internal processing comprises three principal components:

  1. Group Normalization: The cc channels are partitioned into GG groups of size Cg=c/GC_g = c/G. For each group gg, the mean μg\mu_g and variance σg2\sigma_g^2 are computed over the entire batch and all spatial locations, defining per-group normalized activations.
  2. Affine Rescaling: Two learnable vectors γ,βRc\gamma, \beta \in \mathbb{R}^c enable per-channel scaling and shifting of the normalized outputs.
  3. Pointwise Linear Projection: A 1×11 \times 1 convolution (Conv1×1\text{Conv}_{1 \times 1}) with kernel WRc×c×1×1W \in \mathbb{R}^{c' \times c \times 1 \times 1} and bias bRcb \in \mathbb{R}^{c'} projects the features into a new channel dimension for reintegration into the network’s main gradient path.

No nonlinearities are introduced between normalization and the 1×11 \times 1 convolution, preserving the block’s strict linearity and theoretical invertibility (with the caveat of ϵ\epsilon for numerical stability).

2. Mathematical Formulation

For input FRn×c×h×wF \in \mathbb{R}^{n \times c \times h \times w}, channel dimension is divided into GG groups. For each gg (1gG1 \leq g \leq G):

  • Group mean:

μg=1nCghwi=1ncgroup gu=1hv=1wFi,c,u,v\mu_g = \frac{1}{n\, C_g\, h\, w} \sum_{i=1}^n \sum_{c \in \text{group }g} \sum_{u=1}^h \sum_{v=1}^w F_{i, c, u, v}

  • Group variance:

σg2=1nCghwi=1ncgroup gu=1hv=1w(Fi,c,u,vμg)2\sigma_g^2 = \frac{1}{n\, C_g\, h\, w} \sum_{i=1}^n \sum_{c \in \text{group }g} \sum_{u=1}^h \sum_{v=1}^w (F_{i, c, u, v} - \mu_g)^2

  • Normalization (for (i,c,u,v)(i, c, u, v) in group gg):

F^i,c,u,v=Fi,c,u,vμgσg2+ϵ\hat{F}_{i, c, u, v} = \frac{F_{i, c, u, v} - \mu_g}{\sqrt{\sigma_g^2 + \epsilon}}

  • Affine transformation:

F~i,c,u,v=γcF^i,c,u,v+βc\tilde{F}_{i, c, u, v} = \gamma_c \cdot \hat{F}_{i, c, u, v} + \beta_c

  • 1×11 \times 1 convolution (output to cc' channels):

Yi,k,u,v=c=1cWk,cF~i,c,u,v+bkY_{i, k, u, v} = \sum_{c=1}^c W_{k, c} \cdot \tilde{F}_{i, c, u, v} + b_k

This complete process is concisely expressed as:

GN-CBLinear(F)=Conv1×1(γF^+β)\text{GN-CBLinear}(F) = \text{Conv}_{1 \times 1}(\gamma \odot \hat{F} + \beta)

3. Contextual Comparison to Alternative Normalizers

GN-CBLinear’s use of group normalization distinguishes it from:

Normalizer Type Statistic Scope Batch-Size Sensitivity Principal Tradeoffs
BatchNorm (BN) Across batch Yes (unstable n16n\lesssim16) High variance in micro-batch, unsuitable for small nn
LayerNorm (LN) All channels, per pixel No May lose channelwise discrimination
InstanceNorm (IN) Each channel, per sample No Removes global context, no inter-channel correlation
GroupNorm (GN) Channel groups, per sample No Balances channel structure retention with stability

GN-CBLinear inherits the batch-size independence and moderate grouping of GN, leading to resilience during micro-batch training while retaining greater cross-channel structure than IN and more subtlety than LN.

4. Implementation Specifics

  • Default group count G=32G = 32, following Wu & He (2018).
  • Numerical stability ϵ=1×105\epsilon = 1 \times 10^{-5}.
  • 1×11 \times 1 convolution kernel with stride =1= 1, padding =0= 0.
  • Linear, non-activated shortcut: no nonlinearity between normalization and convolution is employed to maintain reversibility in the PGI branch.
  • Weight initialization: γ\gamma initialized to $1$, WW initialized near identity with a scaled normal distribution, facilitating the shortcut to initially approximate an identity mapping.

5. Flow in Forward and Backward Passes

Forward Pass: For each sample and group, GN computes the normalization statistics, performs scaling and bias, and applies the pointwise convolution. All operations in this block are strictly linear, and the mapping remains invertible except for the influence of ϵ\epsilon.

Backward Pass: Gradients propagate through the 1×11 \times 1 convolution and subsequently through the affine and normalization layers. Owing to GN’s reliance solely on per-sample statistics, gradient variance remains bounded even when nn is small, circumventing the instability typical of BN in micro-batch contexts. This characteristic stabilizes loss and improves convergence behavior in deep YOLO models for remote sensing.

6. Integration with LiM-YOLO Architecture

GN-CBLinear is deployed at each of the P2, P3, and P4 auxiliary PGI branches in LiM-YOLO, specifically inserted before their outputs merge with the main backbone features. With P5 layers pruned in LiM-YOLO, only P2–P4 receive GN-CBLinear modules. This placement targets the pyramid levels where normalization robustness is most critical, consistent with the model’s pyramid level shift strategy for resolving fine-scale maritime objects (Kim et al., 10 Dec 2025).

7. Empirical Effectiveness and Observed Gains

Ablation studies with batch size =2= 2, Adam optimizer, and 100-epoch training on SODA-A, DOTA-v1.5, FAIR1M, and ShipRSImageNet-v1 datasets demonstrate measurable improvements in mean Average Precision (mAP5095\mathrm{mAP}_{50–95}) relative to the unnormalized CBLinear:

Dataset Baseline mAP GN-CBLinear mAP Gain
SODA-A 0.660 0.662 +0.2 pp
DOTA-v1.5 0.744 0.750 +0.6 pp
FAIR1M 0.301 0.302 +0.1 pp
ShipRSImageNet-v1 0.428 0.448 +2.0 pp

Training metrics reveal that the inclusion of GN-CBLinear results in reduced oscillations in loss and smoother, more stable convergence from the initial epoch. These findings validate GN-CBLinear's ability to sufficiently stabilize micro-batch training while maintaining the linear-shortcut architecture of the PGI branch, yielding modest and consistent improvements in stability and detection accuracy (Kim et al., 10 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Group Normalized Convolutional Block for Linear Projection (GN-CBLinear).