Papers
Topics
Authors
Recent
Search
2000 character limit reached

Recurrent Layer Aggregation in CNNs

Updated 31 May 2026
  • Recurrent Layer Aggregation is a feature reuse mechanism that integrates outputs from previous CNN layers using a compact recurrent state.
  • It achieves linear parameter growth and controlled lag by sharing weights and summarizing past features, enhancing efficiency.
  • Empirical evaluations demonstrate improved performance in classification, detection, and segmentation with only marginal computational overhead.

Recurrent Layer Aggregation (RLA) is a mechanism for feature reuse in deep convolutional neural networks (CNNs) that introduces a parameter-efficient, recurrent aggregation path alongside existing feedforward architectures. By incorporating a compact recurrent state that summarizes information across all previous layers within each resolution stage, RLA achieves effective feature aggregation with linear parameter growth and controlled lag, addressing critical inefficiencies in prior approaches such as DenseNet. RLA modules are compatible with mainstream CNN backbones (ResNet, Xception, MobileNetV2) and have demonstrated empirical improvements on standard benchmarks in image classification, object detection, and instance segmentation (Zhao et al., 2021).

1. Motivation and Background

Layer aggregation refers to the reuse of activations from earlier layers to inform computation at the current layer, formalized as producing new activations At=gt(xt1,xt2,...,x0)A^t = g^t(x^{t-1}, x^{t-2}, ..., x^0) and xt=ft(At1,xt1)x^t = f^t(A^{t-1}, x^{t-1}). DenseNet exemplifies this mechanism via concatenation: each layer receives features from all precedents and processes them through learned convolutions. However, DenseNet's approach incurs O(L2)\mathcal{O}(L^2) parameter growth per LL-layer stage and leads to substantial redundancy, as low-lag connections dominate and later layer contributions diminish empirically.

RLA was developed to resolve this by:

  • Replacing dense skip-connections with a single compact hidden state hth^t (the "recurrent aggregator") that summarizes all prior layer outputs,
  • Employing weight sharing (parameter tying) across depth, and
  • Achieving O(L)\mathcal{O}(L) parameter and computational complexity per stage.

This design yields an aggregation effect mathematically analogous to an ARMA(1,1) process along the network depth axis, giving the RLA module better control over historical information decay while maintaining efficiency (Zhao et al., 2021).

2. Structural Design and Layerwise Operation

Within a typical residual block augmented with RLA, two parallel computational paths operate:

  • Residual path: Standard two- or three-convolution residual unit produces yty^t, yielding xt=xt1+ytx^t = x^{t-1} + y^t.
  • Recurrent aggregator path:
    • g1g_1: A shared 1×11\times1 convolution compresses xt=ft(At1,xt1)x^t = f^t(A^{t-1}, x^{t-1})0 to xt=ft(At1,xt1)x^t = f^t(A^{t-1}, x^{t-1})1 channels.
    • xt=ft(At1,xt1)x^t = f^t(A^{t-1}, x^{t-1})2: A shared xt=ft(At1,xt1)x^t = f^t(A^{t-1}, x^{t-1})3 convolution (with batch-normalization and xt=ft(At1,xt1)x^t = f^t(A^{t-1}, x^{t-1})4) updates the hidden state xt=ft(At1,xt1)x^t = f^t(A^{t-1}, x^{t-1})5.
    • The recurrent state is updated by xt=ft(At1,xt1)x^t = f^t(A^{t-1}, x^{t-1})6.

This process forms a single, compact “memory” (hidden state) that propagates through every block in a given stage. At input, xt=ft(At1,xt1)x^t = f^t(A^{t-1}, x^{t-1})7; at the stage boundary, xt=ft(At1,xt1)x^t = f^t(A^{t-1}, x^{t-1})8 is spatially downsampled via average pooling to match changing resolutions (e.g., after strided convolutions).

At the network’s terminus for classification, xt=ft(At1,xt1)x^t = f^t(A^{t-1}, x^{t-1})9 is concatenated with O(L2)\mathcal{O}(L^2)0 before the final fully-connected classifier. For detection or segmentation with FPN, O(L2)\mathcal{O}(L^2)1 is discarded after the backbone (Zhao et al., 2021).

3. Mathematical Formulation

The core recursive equations for RLA are:

  • Recurrent state update:

O(L2)\mathcal{O}(L^2)2

  • Residual feature update:

O(L2)\mathcal{O}(L^2)3

  • Residual output:

O(L2)\mathcal{O}(L^2)4

By recursive unrolling, O(L2)\mathcal{O}(L^2)5 can be interpreted as an additive aggregation of all past O(L2)\mathcal{O}(L^2)6's, filtered through depth by repeated application of O(L2)\mathcal{O}(L^2)7, resulting in an exponentially decaying influence from older layers. This compressed summary replaces the explicit concatenation in DenseNet and analogous systems, yielding memory efficiency and computational tractability (Zhao et al., 2021).

4. Integration with Common Modern CNN Backbones

RLA is compatible with multiple widely-used backbone architectures, with the following integration strategies:

  • ResNet-50/101/152: Insert one RLA module per residual block; O(L2)\mathcal{O}(L^2)8 and O(L2)\mathcal{O}(L^2)9 are shared (tied) across all blocks within each stage of constant spatial resolution. State LL0 is initialized to zero, downsampled between stages (by average pooling), and concatenated at output before classification FC.
  • Xception: RLA modules are shared per resolution-group of depthwise-separable convolutional blocks, with separable LL1.
  • MobileNetV2: The RLA state is concatenated after the first LL2 expansion in each inverted bottleneck block to avoid channel explosion; LL3 uses depthwise-separable convolutions to control cost.

For all backbones, RLA maintains stage-wise weight sharing, hidden state spatial downsampling, and final concatenation for classification tasks (Zhao et al., 2021).

5. Complexity and Resource Overhead

Compared to standard backbones, RLA introduces minimal parameter and compute overhead, summarized as follows:

Model Params (M) FLOPs (G) Top-1 Err. (%) Change
ResNet-50 24.37 3.83 24.70
+ RLA 24.67 4.17 22.83 +1.87 acc, +1.2% params, +9% FLOPs
ResNet-164 1.72 8.55M 5.72 (C-10)
+ RLA 1.74 8.74M 4.95 (C-10) -0.77 pp err., +1.2% params, +2.2% FLOPs

Training time increases (15–19% on ResNet-101/ImageNet), and inference speed is reduced by 2–3%. This resource increase is offset by marked improvements in accuracy and task performance across datasets (Zhao et al., 2021).

6. Empirical Results on Standard Benchmarks

RLA has been systematically evaluated on CIFAR-10/100, ImageNet, and MS COCO. Uniform accuracy improvements are observed across backbones and tasks:

CIFAR-10/100 Test Error (%):

Model Params FLOPs C-10 C-100
ResNet-110 1.73M 8.67M 6.35 28.51
+ RLA 1.80M 9.04M 5.88 27.44
ResNet-164 1.72M 8.55M 5.72 25.22
+ RLA 1.74M 8.74M 4.95 23.78

ImageNet (single-crop) Top-1 / Top-5 Error:

Model Params FLOPs Top-1 Top-5
ResNet-50 24.37M 3.83G 24.70 7.80
+ RLA 24.67M 4.17G 22.83 6.58
+ ECA+RLA 24.67M 4.18G 22.15 6.11
RLA-ResNet50† 24.67M 4.17G 20.25 5.12

MS COCO Object Detection / Segmentation:

  • Faster R-CNN @R-50: AP improves from 36.4 → 38.8 (+2.4)
  • Faster R-CNN @R-101: AP improves from 38.7 → 41.2 (+2.5)
  • RetinaNet @R-50: AP improves from 35.6 → 37.9 (+2.3)
  • Mask R-CNN @R-50: bbox AP from 37.2 → 39.5 (+2.3), mask AP from 34.1 → 35.6 (+1.5)

These consistent gains, with marginal computational and parameter penalty, demonstrate the practical advantages of RLA as a module for deep feature aggregation (Zhao et al., 2021).

7. Ablation Studies and Implementation Findings

Extensive ablation on CIFAR and ImageNet explores RLA's design:

  • Weight sharing across depth within stage is critical: shared RLA achieves lower error and parameter count than unshared.
  • Feature exchange: Disabling the two-way exchange of LL4 and LL5 paths degrades performance.
  • ConvLSTM in place of the simple ConvRNN does not yield additional gains and increases resource usage.
  • Pre-activation (BN→tanh→Conv) for LL6 improves results vs. post-activation.
  • Connectivity: Among six variants tested, add-then-ConvRNN (RLA's choice) is optimal.
  • RLA channel size LL7: On CIFAR-10/ResNet-164, the optimal is LL8.

A concise pseudocode reference is provided in the original work, exemplifying stage-wise module structure, weight sharing, and recurrent updates within a PyTorch framework (Zhao et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Recurrent Layer Aggregation (RLA).