1D Res2Net Modules for Sequence Modeling

Updated 10 February 2026

1-Dimensional Res2Net modules extend ResNet bottlenecks by splitting channels into multiple scales to capture both fine and coarse temporal features.
Gated variants such as CG-Res2Net and GRes2Net dynamically modulate inter-channel information flow, enhancing multi-scale feature representation.
This architecture improves efficiency and accuracy in sequence modeling tasks like synthetic speech detection and time series analysis.

The one-dimensional (1D) Res2Net module is an architectural extension of the ResNet bottleneck block designed specifically for sequence modeling tasks, such as time series analysis and 1D signal processing. It generalizes multi-scale representation learning to the temporal or sequential domain by constructing hierarchical residual-like connections across distinct channel groups within each block. This design supports flexible receptive fields and encourages both fine- and coarse-scale feature interactions, offering advantages for complex time-dependent learning scenarios.

1. Architectural Principles of the 1D Res2Net Module

The 1D Res2Net module modifies the standard ResNet bottleneck by splitting the expanded feature channels into $s$ parallel groups, termed scales (Li et al., 2021, Yang et al., 2020). Given an input $X \in \mathbb{R}^{C_0 \times T}$ , where $C_0$ is the number of channels and $T$ is the temporal length, a 1×1 convolution expands $X$ to $U \in \mathbb{R}^{(s \cdot C) \times T}$ , where $s$ is the number of scales and $C$ is the per-group channel width. The expanded tensor $U$ is then split into $s$ groups $[x_1, x_2, \ldots, x_s]$ , each $x_i \in \mathbb{R}^{C \times T}$ .

The inner multi-scale processing is defined recursively:

$y_1 = x_1$
$y_2 = K_2(x_2)$
$y_i = K_i(x_i + y_{i-1})$ , for $i = 3, \ldots, s$

where $K_i(\cdot)$ denotes a 1D convolution on group $i$ (typically with kernel size 3 and padding 1). The outputs $y_1, ..., y_s$ are concatenated along the channel axis, compressed via a second 1×1 convolution, and combined with the residual connection from $X$ , followed by batch normalization and ReLU:

$\begin{aligned} Y &= \mathrm{concat}[y_1, y_2, \ldots, y_s] \in \mathbb{R}^{(s \cdot C) \times T} \ R &= \mathrm{Conv}_{1 \times 1}(Y) \in \mathbb{R}^{C_0 \times T} \ \mathrm{Out} &= \mathrm{ReLU}(\mathrm{BatchNorm}(R + X)) \end{aligned}$

(Li et al., 2021, Yang et al., 2020).

2. Gated Extensions and Channel-wise Control

To enhance the ability of the block to selectively propagate information across groups and better handle inter-channel correlations, gated variants have been proposed.

CG-Res2Net (Li et al., 2021) introduces a channel-wise gate $a_{i-1} \in (0,1)^C$ to modulate information flow from $y_{i-1}$ to $y_i$ . The residual-like connection is modified as:

$z_{i-1} = y_{i-1} \otimes a_{i-1}, \quad y_i = K_i(x_i + z_{i-1})$

The gate $a_{i-1}$ is computed via mechanisms such as global average pooling of feature maps, followed by fully connected layers and sigmoid activation.

Gated Res2Net (GRes2Net) (Yang et al., 2020) further generalizes gating, making the gate $g_i \in \mathbb{R}^{B \times w \times L}$ (batch, width, length) dependent on the current input, previous output, and the block’s original input:

$g_i = \tanh( A[a(X);\;a(y_{i-1});\;a(x_i)] )$

where $A$ and $a$ are 1×1 conv + BatchNorm + ReLU modules. The fusion at each hierarchical step is:

$y_i = K_i( x_i + g_i \odot y_{i-1} )$

This formulation allows the model to learn dynamic, per-element modulation for each hierarchical connection, enhancing multi-scale temporal feature learning.

The modules may use different gating computation strategies, such as single-group and multi-group gates, with or without latent space projections, as detailed in (Li et al., 2021).

3. Implementation Details and Hyperparameterization

The main architectural parameters of 1D Res2Net and its gated variants are:

Parameter	Typical Values	Role
Scales $s$	4	Number of parallel groups (feature splits) in each block
Channels per group $C$ / $w$	12–32	Number of channels per group after expansion
Conv kernel size	3	Size for the per-group 1D convolutions $K_i$
Gating type	None / CG / G	Choice of plain (vanilla), channel-gated (CG), or full gating (GRes2Net)
Gate activation	Sigmoid/tanh	Controls the range for channel-wise or per-element gates
Channel expansion	1×1 conv	Converts from $C_0$ to $s \cdot C$ input to the multi-scale block
Channel compression	1×1 conv	Reduces from $s \cdot C$ to the desired output width after multi-scale fusion

The 1D GRes2Net block outlined in (Yang et al., 2020) emphasizes that all convolutions (1×1, 3×1) are followed by BatchNorm and ReLU, with gating submodules kept lightweight to minimize parameter overhead.

4. Pseudocode Representation

Representative PyTorch-style pseudocode for a generic 1D gated Res2Net block is provided in (Yang et al., 2020, Li et al., 2021). The forward pass, omitting explicit gating details for brevity, follows this pattern:

U = Conv1x1_expand(X)
U = BatchNorm(U); U = ReLU(U)
chunks = split_channels(U, groups=s)
y = []
for i, xi in enumerate(chunks):
    if i == 0:
        yi = xi
    elif i == 1:
        yi = K2_conv(xi); yi = BatchNorm(yi); yi = ReLU(yi)
    else:
        g = compute_gate(X, y[i-1], xi)  # e.g., via 1x1 convs + tanh
        fused = xi + g * y[i-1]
        yi = Ki_conv(fused); yi = BatchNorm(yi); yi = ReLU(yi)
    y.append(yi)
Y_concat = concatenate(y)
Y = Conv1x1_compress(Y_concat)
Y = BatchNorm(Y); Y = ReLU(Y)
return Y

(Yang et al., 2020)

5. Comparative Properties and Significance

The 1D Res2Net and its gated variants introduce flexible hierarchical receptive fields within a single convolutional block, as opposed to stacking deeper layers for multi-scale aggregation. This enables:

Simultaneous local and broad temporal receptive fields within each block.
Dynamic, learned weighting of information passed across scales (gated variants).
Improved efficiency for sequence modeling, as the multi-scale hierarchy within a block replaces the need for deeper stacks or auxiliary context modules.

Empirically, such architectures have demonstrated consistent gains over vanilla ResNet-style 1D CNNs for tasks including synthetic speech detection (Li et al., 2021) and multivariate time series classification/forecasting (Yang et al., 2020). Gating mechanisms provide further accuracy improvements by suppressing irrelevant information and promoting robust multi-scale dependency learning.

6. Applications and Impact

1D Res2Net modules are employed in deep learning models targeting sequential and temporal domains, notably:

Synthetic speech artifact detection systems, where they improve generalization to unseen spoofing attacks via flexible receptive field adaptation and channel-wise selection (Li et al., 2021).
Multivariate time series analysis, both for classification and forecasting, where hierarchical gating yields state-of-the-art performance, including more accurate temporal feature extraction and correlation modeling (Yang et al., 2020).

A plausible implication is that the gating-enhanced Res2Net structures could be particularly advantageous in domains where input variables exhibit time-dependent and context-sensitive importance, as in sensor fusion, audio, and biomedical signal processing.

7. Relation to Original Res2Net and Extensions

The one-dimensional Res2Net block can be viewed as a direct analogue to the original Res2Net architecture proposed for 2D image processing (Gao et al., CVPR 2020), but adapted to operate exclusively along the temporal or sequence axis. Unlike the image domain, 1D applications often benefit from finer gating granularity, as temporal and inter-channel dependencies are more variable and task-specific.

A notable distinction is the proliferation of gating mechanisms in 1D variants (CG-Res2Net, GRes2Net), reflecting the increased utility of per-channel or per-step selection in sequential modeling. These gating innovations have not only extended the expressive power but have empirically demonstrated value on standard sequence learning benchmarks (Li et al., 2021, Yang et al., 2020).

Markdown Report Issue Upgrade to Chat

References (2)

Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks (2021)

Gated Res2Net for Multivariate Time Series Analysis (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to 1-Dimensional Res2Net Modules.