GN-CBLinear: Group Norm for Linear Projection

Updated 17 December 2025

The paper demonstrates that GN-CBLinear stabilizes gradient propagation in micro-batch settings by combining group normalization, affine rescaling, and pointwise projection.
The module is designed as a lightweight and strictly linear shortcut that maintains reversibility and consistent training performance compared to BatchNorm.
Empirical results on remote sensing datasets show modest mAP improvements and enhanced convergence stability in LiM-YOLO architectures.

The Group Normalized Convolutional Block for Linear Projection (GN-CBLinear) is a lightweight, batch-size–agnostic auxiliary normalization module designed for stable, efficient gradient propagation in convolutional neural networks under micro-batch regimes. Integrated into the LiM-YOLO detector for ship detection in optical remote sensing imagery, GN-CBLinear provides a linear, reversible “shortcut” enabling robust gradient flow without relying on mini-batch–dependent statistics, addressing the instability commonly associated with BatchNorm in small-batch settings. GN-CBLinear combines group normalization, affine per-channel scaling, and pointwise convolution to stabilize the auxiliary Programmable Gradient Information (PGI) branches of LiM-YOLO and achieve consistent training performance improvements over conventional approaches (Kim et al., 10 Dec 2025).

1. Architectural Composition

GN-CBLinear operates on input tensors $F \in \mathbb{R}^{n \times c \times h \times w}$ , with $n$ as batch size, $c$ channels, and $h \times w$ spatial resolution. Its internal processing comprises three principal components:

Group Normalization: The $c$ channels are partitioned into $G$ groups of size $C_g = c/G$ . For each group $g$ , the mean $\mu_g$ and variance $\sigma_g^2$ are computed over the entire batch and all spatial locations, defining per-group normalized activations.
Affine Rescaling: Two learnable vectors $\gamma, \beta \in \mathbb{R}^c$ enable per-channel scaling and shifting of the normalized outputs.
Pointwise Linear Projection: A $1 \times 1$ convolution ( $\text{Conv}_{1 \times 1}$ ) with kernel $W \in \mathbb{R}^{c' \times c \times 1 \times 1}$ and bias $b \in \mathbb{R}^{c'}$ projects the features into a new channel dimension for reintegration into the network’s main gradient path.

No nonlinearities are introduced between normalization and the $1 \times 1$ convolution, preserving the block’s strict linearity and theoretical invertibility (with the caveat of $\epsilon$ for numerical stability).

2. Mathematical Formulation

For input $F \in \mathbb{R}^{n \times c \times h \times w}$ , channel dimension is divided into $G$ groups. For each $g$ ( $1 \leq g \leq G$ ):

Group mean:

$\mu_g = \frac{1}{n\, C_g\, h\, w} \sum_{i=1}^n \sum_{c \in \text{group }g} \sum_{u=1}^h \sum_{v=1}^w F_{i, c, u, v}$

Group variance:

$\sigma_g^2 = \frac{1}{n\, C_g\, h\, w} \sum_{i=1}^n \sum_{c \in \text{group }g} \sum_{u=1}^h \sum_{v=1}^w (F_{i, c, u, v} - \mu_g)^2$

Normalization (for $(i, c, u, v)$ in group $g$ ):

$\hat{F}_{i, c, u, v} = \frac{F_{i, c, u, v} - \mu_g}{\sqrt{\sigma_g^2 + \epsilon}}$

Affine transformation:

$\tilde{F}_{i, c, u, v} = \gamma_c \cdot \hat{F}_{i, c, u, v} + \beta_c$

$1 \times 1$ convolution (output to $c'$ channels):

$Y_{i, k, u, v} = \sum_{c=1}^c W_{k, c} \cdot \tilde{F}_{i, c, u, v} + b_k$

This complete process is concisely expressed as:

$\text{GN-CBLinear}(F) = \text{Conv}_{1 \times 1}(\gamma \odot \hat{F} + \beta)$

3. Contextual Comparison to Alternative Normalizers

GN-CBLinear’s use of group normalization distinguishes it from:

Normalizer Type	Statistic Scope	Batch-Size Sensitivity	Principal Tradeoffs
BatchNorm (BN)	Across batch	Yes (unstable $n\lesssim16$ )	High variance in micro-batch, unsuitable for small $n$
LayerNorm (LN)	All channels, per pixel	No	May lose channelwise discrimination
InstanceNorm (IN)	Each channel, per sample	No	Removes global context, no inter-channel correlation
GroupNorm (GN)	Channel groups, per sample	No	Balances channel structure retention with stability

GN-CBLinear inherits the batch-size independence and moderate grouping of GN, leading to resilience during micro-batch training while retaining greater cross-channel structure than IN and more subtlety than LN.

4. Implementation Specifics

Default group count $G = 32$ , following Wu & He (2018).
Numerical stability $\epsilon = 1 \times 10^{-5}$ .
$1 \times 1$ convolution kernel with stride $= 1$ , padding $= 0$ .
Linear, non-activated shortcut: no nonlinearity between normalization and convolution is employed to maintain reversibility in the PGI branch.
Weight initialization: $\gamma$ initialized to $1$, $W$ initialized near identity with a scaled normal distribution, facilitating the shortcut to initially approximate an identity mapping.

5. Flow in Forward and Backward Passes

Forward Pass: For each sample and group, GN computes the normalization statistics, performs scaling and bias, and applies the pointwise convolution. All operations in this block are strictly linear, and the mapping remains invertible except for the influence of $\epsilon$ .

Backward Pass: Gradients propagate through the $1 \times 1$ convolution and subsequently through the affine and normalization layers. Owing to GN’s reliance solely on per-sample statistics, gradient variance remains bounded even when $n$ is small, circumventing the instability typical of BN in micro-batch contexts. This characteristic stabilizes loss and improves convergence behavior in deep YOLO models for remote sensing.

6. Integration with LiM-YOLO Architecture

GN-CBLinear is deployed at each of the P2, P3, and P4 auxiliary PGI branches in LiM-YOLO, specifically inserted before their outputs merge with the main backbone features. With P5 layers pruned in LiM-YOLO, only P2–P4 receive GN-CBLinear modules. This placement targets the pyramid levels where normalization robustness is most critical, consistent with the model’s pyramid level shift strategy for resolving fine-scale maritime objects (Kim et al., 10 Dec 2025).

7. Empirical Effectiveness and Observed Gains

Ablation studies with batch size $= 2$ , Adam optimizer, and 100-epoch training on SODA-A, DOTA-v1.5, FAIR1M, and ShipRSImageNet-v1 datasets demonstrate measurable improvements in mean Average Precision ( $\mathrm{mAP}_{50–95}$ ) relative to the unnormalized CBLinear:

Dataset	Baseline mAP	GN-CBLinear mAP	Gain
SODA-A	0.660	0.662	+0.2 pp
DOTA-v1.5	0.744	0.750	+0.6 pp
FAIR1M	0.301	0.302	+0.1 pp
ShipRSImageNet-v1	0.428	0.448	+2.0 pp

Training metrics reveal that the inclusion of GN-CBLinear results in reduced oscillations in loss and smoother, more stable convergence from the initial epoch. These findings validate GN-CBLinear's ability to sufficiently stabilize micro-batch training while maintaining the linear-shortcut architecture of the PGI branch, yielding modest and consistent improvements in stability and detection accuracy (Kim et al., 10 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

LiM-YOLO: Less is More with Pyramid Level Shift and Normalized Auxiliary Branch for Ship Detection in Optical Remote Sensing Imagery (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Group Normalized Convolutional Block for Linear Projection (GN-CBLinear).