GELAN: Generalized Efficient Layer Aggregation

Updated 25 January 2026

GELAN is a lightweight neural network architecture that maximizes information flow by fusing multi-scale features with dynamic, learnable, and identity-preserving connections.
It employs gradient path planning and multi-branch fusion to maintain both forward spatial cues and robust backward gradient propagation, addressing information bottlenecks.
GELAN demonstrates high efficiency in diverse tasks such as object, space, and medical image detection, effectively balancing accuracy, latency, and parameter usage.

The Generalized Efficient Layer Aggregation Network (GELAN) is a lightweight, modular neural network architecture designed to maximize information flow in deep detectors by fusing multi-scale features across depths with high computational efficiency. Originating in the context of YOLOv9 and programmable gradient methods, GELAN generalizes and extends prior aggregation paradigms—such as ELAN and CSP—by introducing dynamic, learnable fusion pathways and identity-preserving residual connections. In practice, GELAN achieves state-of-the-art performance in diverse detection settings ranging from object detection in natural imagery to space object detection and medical image analysis, while offering a favorable trade-off in accuracy, latency, and parameter utilization compared to traditional depthwise and path-agnostic designs (Wang et al., 2024, Zhang et al., 2024, Youwai et al., 2024, Balakrishnan et al., 2024, Zhang et al., 3 May 2025).

1. Theoretical Motivations and Network Design Principles

GELAN is motivated by the information bottleneck principle, which highlights that mutual information between input and output decreases as feature maps undergo layerwise transformations, risking semantic information collapse and gradient attenuation. This leads to unreliable weight updates in deep architectures when gradients cannot be effectively transported back to earlier layers (Wang et al., 2024). To address this, GELAN employs:

Gradient path planning: Structured, multiple, identity-augmented paths from deep to shallow layers, ensuring all gradients retain high mutual information with the input.
Multi-branch fusion: Aggregation of branch outputs at each stage, blending both local (low-level) and global (high-level) semantics.

The design provides both forward robustness (preserving spatial and semantic cues) and backward robustness (preserving gradient magnitudes for effective training).

2. GELAN Core Architecture and Aggregation Mechanics

GELAN’s backbone and neck are assembled from “CSP–ELAN” blocks or related ELAN-style aggregation modules, which follow a multi-branch, split-transform-merge strategy. The main workflow in canonical GELAN is as follows (Wang et al., 2024, Zhang et al., 2024, Youwai et al., 2024):

Input Split: For input tensor $X$ , split channels into parts $(X_a, X_b)$ .
Parallel Transformation: Feed $X_b$ through $D_{\rm ELAN}$ convolutional sub-branches, each with identity connections:

$F_i = f_{\theta_i}(F_{i-1}),\quad F_0 = X_b$

Concatenation and Reduction: Concatenate $X_a$ and all sub-branch outputs, project via $1\times1$ convolution back to the desired channel width:

$Y = \mathrm{Conv}\left(\mathrm{Concat}(X_a, F_1, ..., F_{D_{\rm ELAN}})\right)$

CSP Split-Merge (optional): Repeat above in a cross-stage partial (CSP) merge with split redundancy control.

In the neck and head, GELAN modules aggregate feature maps from multiple depth-resolved paths (e.g., three scales for detection) through concatenation and $1\times1$ conv reduction. The “generalized” aspect denotes arbitrarily flexible input branch counts and channel ratios.

Block pseudocode as used in YOLO9tr (Youwai et al., 2024):

def GELAN_Block(X, C_out, K=4):
    split_channels = C_out // K
    branches = [SiLU(BN(Conv3x3(X, split_channels))) for _ in range(K)]
    Z = Concat(branches, axis=channel)
    Y = SiLU(BN(Conv1x1(Z, out_channels=C_out)))
    return Y + X if X.shape[1] == C_out else Y

3. Layer Aggregation and Programmable Gradient Information (PGI)

A distinguishing GELAN feature is its support for programmable gradient information (PGI), enabling customized backward signal propagation to combat the loss of trainability found in conventional architectures (Wang et al., 2024). Key mechanisms include:

Auxiliary reversible branch: Defines operations $r_{\psi}$ (forward) and $v_{\zeta}$ (inverse) such that $X = v_{\zeta}(r_{\psi}(X))$ , thereby preserving full input information in auxiliary losses and making all auxiliary gradients reliable.
Multi-level heads and aggregation: At each semantic level $k$ , $g_k$ gradients are computed; a small aggregation module $h_\varphi$ forms the final auxiliary gradient as a convex combination:

$G_{\rm agg} = \sum_{i=1}^k \alpha_i g_i,\quad \sum_i \alpha_i = 1$

Programmable update rule:

$G_{\rm prog} = G_{\rm main} + \lambda G_{\rm agg}$

This construction ensures that gradients from auxiliary objectives and main path are always blended in a way that prevents collapse, thus enabling effective end-to-end training from scratch.

4. Variants and Extensions: ViT and SE Integration

GELAN has been widely extended with architectural innovations for resource-constrained, low-latency scenarios:

a. Vision Transformer Paths

GELAN-ViT and GELAN-RepViT (Zhang et al., 2024, Zhang et al., 3 May 2025):
- Parallel branches are created at the final stage: one pure CNN, one ViT-based.
- Final CNN feature maps are patchified and processed by transformer encoders; outputs are merged with CNN outputs at detection heads.
- GELAN-RepViT uses shallower, reduced-width ViT paths for ultra-low-power regimes.

b. Squeeze-and-Excitation (SE) Fusion

GELAN-ViT-SE (Zhang et al., 3 May 2025):
- Fuses SE blocks into each RepNCSPELAN4 aggregator, recalibrating channel responses adaptively following global average pooling and two-layer MLP (as in Eq. 4–5 of the paper).
- Yields both higher accuracy (mAP50↑, mAP50:95↑) and reduced GFLOPs/power compared to plain GELAN and ViT-augmented GELAN.

c. RepVGG and SPP Integration

RepVGG-GELAN (Balakrishnan et al., 2024):
- Employs SPP+ELAN block (aggregates deep features with spatial multi-scale maxpooling), followed by RepNCSPELAN4 top-down fusion at each detection scale.
- Demonstrates improved parameter efficiency (25.4M params, 240.7G FLOPs) and improved AP50 compared to RCS-YOLO and YOLOv8.

5. Empirical Analysis: Computational Efficiency and Performance

GELAN exhibits high resource efficiency, outstripping prior depthwise- and ELAN-only based models across several metrics (Wang et al., 2024, Zhang et al., 2024, Youwai et al., 2024, Zhang et al., 3 May 2025). Key findings:

Model	Params (M)	GFLOPs	mAP50	mAP50:95	Context / Dataset
GELAN-S	7.1	26.4	-	46.7%	COCO, train-from-scratch
GELAN-M	20.0	76.3	-	51.1%	COCO
GELAN-C	25.3	102.1	-	52.5%	COCO
GELAN-E	57.3	189.0	-	55.0%	COCO
GELAN-ViT	1.3–10.2*	5.7	≥0.947	-	SOD, VOC2012
GELAN-ViT-SE	-	5.6	0.751	0.274	SOD
YOLOv9-S	7.09	26.39	-	46.8%	COCO
RepVGG-GELAN	25.4	240.7	0.970	0.723	Medical / Brain Tumor

*Parameter count for GELAN-ViT: as reported in (Zhang et al., 2024), only 1.3M for tiny models.

Notable trends:

GELAN-S achieves comparable or superior AP at ~12% lower FLOPs than depthwise convolutions (Wang et al., 2024).
GELAN-ViT and GELAN-RepViT reduce GFLOPs by 47% versus YOLOv9-t while maintaining or increasing mAP (Zhang et al., 2024).
GELAN-ViT-SE lowers both FLOPs and peak power consumption (7.3→5.6 GFLOPs, 2080.7→2028.7 mW) while boosting mAP50:95 by ~0.8% (Zhang et al., 3 May 2025).
Integrating GELAN in RepVGG-GELAN yields AP50 increases of +2.54 percentage points over strong baselines at a reduced parameter budget (Balakrishnan et al., 2024).

6. Applications and Design for Deployment Constraints

GELAN’s versatility is demonstrated across domains:

Space object detection: Compact GELAN-RepViT models are suitable for CubeSat onboard accelerators (≤8GB memory, ≤2 TFLOPs/s, >5fps at 640×640), addressing real-time collision risk (Zhang et al., 2024, Zhang et al., 3 May 2025).
Road pavement analysis: In YOLO9tr, GELAN with partial attention achieves high inference rate (≈136FPS at 10.2M params), adapting well to real-time monitoring (Youwai et al., 2024).
Medical imaging: RepVGG-GELAN improves precision and AP50 while maintaining real-time throughput and a sub-30M parameter regime (Balakrishnan et al., 2024).

Key design tactics under resource constraints include: reducing backbone width/depth, limiting ViT encoder layers, channel-wise aggregation with lightweight 1×1 convolutions, and selective use of attention modules (SE, PSA).

7. Comparative Analysis and Ablation Findings

Experiments across multiple works reinforce the significance of GELAN’s core features:

Ablations in YOLO9tr show inserting GELAN modules along with PSA boosts mAP50 by ≈1.5 points relative to ordinary ELAN blocks; single-layer PSA alone often degrades accuracy, confirming GELAN’s centrality (Youwai et al., 2024).
GELAN generalizes ELAN by supporting arbitrary branch counts, customized channel splits, and residual additions; empirical evidence favors K=4 branches with moderate width multipliers for speed/accuracy trade-off (Youwai et al., 2024).
ViT and SE hybridizations systematically improve accuracy and/or efficiency, though a small increase in latency may occur (Zhang et al., 3 May 2025).

References

(Wang et al., 2024): "YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information"
(Zhang et al., 2024): "Sensing for Space Safety and Sustainability: A Deep Learning Approach with Vision Transformers"
(Youwai et al., 2024): "YOLO9tr: A Lightweight Model for Pavement Damage Detection Utilizing a Generalized Efficient Layer Aggregation Network and Attention Mechanism"
(Balakrishnan et al., 2024): "RepVGG-GELAN: Enhanced GELAN with VGG-STYLE ConvNets for Brain Tumour Detection"
(Zhang et al., 3 May 2025): "Toward Onboard AI-Enabled Solutions to Space Object Detection for Space Sustainability"