Multi-ResNet: Multi-branch Residual Networks

Updated 23 June 2026

Multi-ResNet is a neural network architecture that generalizes standard ResNets by incorporating multiple parallel or hierarchical residual branches to enhance expressivity and multi-scale feature aggregation.
The design enables an exponential ensemble of sub-networks via diversified residual pathways, which improves training dynamics and achieves state-of-the-art results in image classification and segmentation.
Variants like Res2Net and MP-ResNet demonstrate practical adaptations, offering improved accuracy and efficiency in specialized domains such as ECG analysis and semantic segmentation.

A Multi-ResNet is a class of neural network architectures that explicitly generalize the residual learning paradigm of standard ResNets to include multiple parallel or hierarchical residual branches within each block. This multi-branching approach exposes new dimensions of expressivity, allows for more effective multi-scale feature aggregation, and enables improved accuracy, training dynamics, or computational efficiency compared to prior single-branch models. Multi-ResNet architectures are realized in several distinct forms across the literature, notably the general Multi-Residual Network (parallel multi-branch in each block) (Abdi et al., 2016), multi-branch heads for imbalanced learning in time series (Xie et al., 2023), hierarchical intra-block multi-scale as in Res2Net (Gao et al., 2019), and multi-path deep architectures for semantic segmentation (Ding et al., 2020). These variants share the central concept of aggregating multiple residual mappings—by summation or concatenation—within a unified architectural framework. The following sections summarize the mathematical formulation, depth–width tradeoffs, empirical findings, parallel implementations, and application-specific adaptations of Multi-ResNets.

1. Mathematical Formulation and Block Structures

1.1 Classical Multi-Residual Block (Parallel Branches)

Given an input $x_\ell\in\mathbb{R}^d$ to block $\ell$ , a standard pre-activation ResNet computes

$x_{\ell+1} = x_\ell + F_\ell(x_\ell)$

where $F_\ell(\cdot)$ is a convolutional–BN–ReLU mapping. The Multi-ResNet generalizes this by using $k$ parallel residual branches $\{F_\ell^i\}_{i=1}^k$ :

$x_{\ell+1} = x_\ell + \sum_{i=1}^k F_\ell^i(x_\ell)$

All branches receive the same input and have parallel conv-BN-ReLU pipelines. This construction increases the number of functional sub-networks (i.e., possible residual pathways when each branch can be “on” or “off”) from $2^n$ in an $n$ -block ResNet to $2^{k n}$ in an $\ell$ 0-block, $\ell$ 1-branch Multi-ResNet (Abdi et al., 2016).

1.2 Hierarchical Multi-Scale (Res2Net Block)

Res2Net inserts a hierarchical residual-like design within each bottleneck block. If an input tensor is split into $\ell$ 2 subsets along the channel dimension, the outputs are recursively defined as

$\ell$ 3

with each $\ell$ 4 a (possibly group-wise) $\ell$ 5 conv. The outputs $\ell$ 6 are concatenated and fused with a $\ell$ 7 conv, then added as a residual skip connection. This structure provides $\ell$ 8 explicit receptive field granularities per block (Gao et al., 2019).

1.3 Task-Specific Multi-Branching

For highly imbalanced classification tasks such as atrial fibrillation detection from ECG, each branch head processes a balanced sub-sample of the data, and the outputs are averaged only at inference (Xie et al., 2023). For semantic segmentation (MP-ResNet), three parallel encoder branches operate at different spatial strides to expand effective receptive field (Ding et al., 2020).

2. Depth–Width Tradeoffs and Ensemble Interpretation

The multi-branch construction trades network depth for width. The ensemble view asserts that a $\ell$ 9-block, $x_{\ell+1} = x_\ell + F_\ell(x_\ell)$ 0-branch Multi-ResNet forms an implicit ensemble of $x_{\ell+1} = x_\ell + F_\ell(x_\ell)$ 1 sub-networks, exponentially boosting the number of distinct functional paths compared to standard ResNets. Empirical analysis (“effective range theory”) finds that most gradient signal during training traverses only shallow/mid-depth sub-paths (e.g., length 10–34 in a 110-layer net); deeper paths contribute minimally due to gradient attenuation.

Once the backbone exceeds a threshold depth $x_{\ell+1} = x_\ell + F_\ell(x_\ell)$ 2 (about 20 blocks for CIFAR data), replacing further depth with added width (i.e., additional branches per block) preserves or improves accuracy under fixed parameter count. For shallow networks ( $x_{\ell+1} = x_\ell + F_\ell(x_\ell)$ 3), pure increased depth outperforms width augmentation (Abdi et al., 2016).

3. Empirical Performance and Comparative Results

3.1 Image Classification: CIFAR and ImageNet

On CIFAR-10/100, Multi-ResNet with $x_{\ell+1} = x_\ell + F_\ell(x_\ell)$ 4 branches achieves state-of-the-art or near state-of-the-art results at fixed parameter budgets. The 26-layer, $x_{\ell+1} = x_\ell + F_\ell(x_\ell)$ 5, 10 $x_{\ell+1} = x_\ell + F_\ell(x_\ell)$ 6 wide Multi-ResNet achieves 3.96% error on CIFAR-10, 19.45% on CIFAR-100; with $x_{\ell+1} = x_\ell + F_\ell(x_\ell)$ 7, the CIFAR-10 error further drops to 3.73% (Abdi et al., 2016). On ImageNet, a 101-layer, $x_{\ell+1} = x_\ell + F_\ell(x_\ell)$ 8 Multi-ResNet obtains 21.53% top-1 error (single crop), outperforming ResNet-200 by 0.13% (top-1).

3.2 Multi-Scale Backbones: Res2Net Gains

Replacing a ResNet-50 bottleneck with a Res2Net-50( $x_{\ell+1} = x_\ell + F_\ell(x_\ell)$ 9=4) block decreases ImageNet top-1 error from 23.85% to 22.01%, with similar or greater gains for deeper models. On PASCAL VOC, Res2Net modules in detection and segmentation pipelines consistently improve mAP and mIoU by 1–5 points against baseline architectures (Gao et al., 2019).

3.3 Application-Specific Multi-ResNets

For ECG-based atrial fibrillation detection, a CWT-MB-ResNet using $F_\ell(\cdot)$ 0 branch heads achieves AUROC of 97.6% and F1 of 0.8865 on the PhysioNet/CinC 2017 data, outperforming conventional deep models (Xie et al., 2023). In semantic segmentation of PolSAR (MP-ResNet), OA improves to 93.95% and fwIoU to 89.63%, demonstrably outperforming a single-path FCN-ResNet34 and other state-of-the-art networks at moderate increases in parameters and FLOPs (Ding et al., 2020).

4. Parallel and Runtime-Optimized Designs

The inherent parallel structure of Multi-ResNet blocks supports model-parallel implementations on multi-GPU platforms. By assigning equal subsets of parallel branches in each block to distinct GPUs, communications are minimized and computations balanced. On dual Nvidia K80 cards, Multi-ResNets with $F_\ell(\cdot)$ 1 achieve 4–15% faster per-step wall-clock time compared to depth-matched baselines, with maximum benefit at small batch sizes (due to thread under-utilization below 32 per GPU). A combination of model-parallel blocks on a single card and data-parallelism across cards yields speed-ups up to 15%, with limited additional memory overhead (Abdi et al., 2016).

5. Theoretical Insights and Effective Path Analysis

Deep ResNets and Multi-ResNets efficiently implement massive implicit ensembles of varying-depth subnetworks, with the training signal concentrated on relatively shallow paths. By explicitly introducing more parallel branches in each block, the number of shallow and medium-depth trainable paths is increased without extending overall depth, thus reducing vanishing gradient hazards and increasing representational power where it is best leveraged (Abdi et al., 2016).

6. Domain-Specific Multi-ResNet Variants

A selection of domain-specific Multi-ResNet variants demonstrates the adaptability of the core multi-branching paradigm:

Multi-branch Heads for Imbalanced Classification: Each branch head independently learns on a balanced subset, with output fusion at inference. Used for robust ECG-based arrhythmia detection (Xie et al., 2023).
Multi-Path Deep Encoders: As in MP-ResNet for PolSAR semantic segmentation, multiple encoder paths span different spatial strides (e.g., 1/8, 1/16, 1/32), and are fused by coarse-to-fine multi-level sum in the decoder. This design captures both fine and global context (Ding et al., 2020).
Hierarchical Within-Block Design for Granular Multi-Scale: Res2Net’s intra-block scale increases object size sensitivity and broadens the effective receptive field within a fixed-depth architecture (Gao et al., 2019).

7. Summary Table: Key Multi-ResNet Variants

Architecture Variant	Multi-Branch Pattern	Empirical Domains
Multi-ResNet (Abdi et al., 2016)	Parallel $F_\ell(\cdot)$ 2 conv-BN-ReLU branches	CIFAR, ImageNet
CWT-MB-ResNet (Xie et al., 2023)	Multi-head, data-balanced branches	ECG time–frequency classification
Res2Net (Gao et al., 2019)	Hierarchical recursive intra-block	ImageNet, Detection/Segmentation
MP-ResNet (Ding et al., 2020)	Multi-path encoder, multi-scale	PolSAR semantic segmentation

Each implementation reflects the general principle of leveraging multiple residual flows—either in parallel, recursively, or across multiple encoder paths—to improve expressivity, accuracy, or computational performance in diverse deep learning contexts.