Papers
Topics
Authors
Recent
2000 character limit reached

1D UNet: Efficient Sequential Segmentation

Updated 8 January 2026
  • 1D UNet is a neural network architecture that replaces 2D convolutions with 1D operations, maintaining U-Net’s multi-scale structure via skip connections.
  • It employs an encoder–decoder design with PixelUnshuffle/PixelShuffle operations and residual blocks to significantly reduce parameters and boost computational efficiency.
  • Applications include image segmentation and time-series event detection, achieving up to 71% model reduction and substantial FLOP savings compared to traditional U-Net models.

A 1D UNet is an encoder–decoder neural network architecture adapted from the classic U-Net, in which one-dimensional (1D) convolutional layers replace the typical two-dimensional (2D) convolutional layers. Designed for data modalities with separable spatial or sequential structure—such as time series, audio, or even images via specific reshaping—the 1D UNet preserves the characteristic multi-scale topology and skip connections of its predecessor while achieving substantial improvements in computational efficiency and model compactness. Its recent manifestations include channel-wise 1D convolutional variants for image segmentation (Byun et al., 2024) and residual 1D UNet models for event segmentation in time-series data, such as electroencephalography (EEG) (&&&1&&&).

1. Architectural Principles of the 1D UNet

Fundamental to all UNet variants is the encoder–decoder paradigm, comprising sequential downsampling blocks followed by corresponding upsampling blocks connected via skip connections. In the 1D UNet, each block replaces 2D convolutions with 1D convolutions configured for the data’s shape.

OneNet (Byun et al., 2024) implements 1D convolutions channel-wise, specifically for image segmentation. The downsampling occurs through PixelUnshuffle operations, which transfer spatial windows into the channel dimension. Each encoder block consists of two channel-wise 1D convolutional layers that operate on a flattened spatial dimension, optionally interleaved with spatial mixing 1D convolutions. The decoder mirrors the encoder, employing PixelShuffle for upsampling and channel-wise 1D convolutions for feature generation. This strategy enables significant parameter reduction and efficient inference.

For time-series data, such as EEG, AugUNet1D (Sengupta et al., 1 Jan 2026) employs strided 1D convolutions, residual blocks, and max-pooling for encoder downsampling. Each residual block includes two Conv1d layers (kernel size 3, padding 1) with batch normalization and ReLU, with an identity shortcut to enhance gradient flow and enable deeper architectures. The decoder stage uses ConvTranspose1d for upsampling, concatenates skip connections, and applies further residual blocks.

2. Mathematical Operations and Data Transformations

The central mathematical operation in the 1D UNet is the 1D convolution, defined for an input x[n]x[n] and kernel w[k]w[k] (size KK, padding pp) as: $(x * w)[n] = \sum_{k=0}^{K-1} w[k]\;x[n + k - p}$

In OneNet, after PixelUnshuffle transformation, spatial dimensions (H×WH \times W) of the input are aggregated into the channel dimension, producing YRB×C×h×wY\in\mathbb{R}^{B\times C' \times h \times w} for a batch size BB, with C=Cs2C' = C\cdot s^2 where ss is the scale factor. Subsequent channel-wise 1D convolution is realized by reshaping to Z=reshape(Y,[B,C,M])Z = \text{reshape}(Y, [B,C',M]) (M=hwM = h \cdot w), and executing: Ob,o,m=i=0C1Wo,iZb,i,mO_{b,o,m} = \sum_{i=0}^{C'-1} W_{o,i}Z_{b,i,m} where WRCout×CW \in \mathbb{R}^{C_{out} \times C'} is the convolutional weight matrix.

PixelUnshuffle and PixelShuffle, respectively, downsample and upsample by reordering elements between spatial and channel dimensions: D(X)b,cs2+i, h, w=Xb,c,hs+i/s, ws+(imods)D(X)_{b,\,c \cdot s^2 + i,\ h,\ w} = X_{b,\,c,\,h \cdot s + \lfloor i / s \rfloor,\ w \cdot s + (i \bmod s)}

S(Y)b,c,hs+i/s, ws+(imods)=Yb,cs2+i, h, wS(Y)_{b,\,c,\,h \cdot s + \lfloor i / s\rfloor,\ w \cdot s + (i \bmod s)} = Y_{b,\,c \cdot s^2 + i,\ h,\ w}

In time-series applications, max-pooling and transposed convolution further control down/up-sampling: MaxPool1d(x)[m]=max0i<sx[sm+i]\mathrm{MaxPool1d}(x)[m] = \max_{0 \leq i < s} x[s\,m + i]

ConvTranspose1d(x)=ix[i]w[ni]\mathrm{ConvTranspose1d}(x) = \sum_{i} x[i]\,w[n - i]

3. Computational Efficiency and Model Compactness

Substituting 2D convolutions with 1D convolutions yields substantial reductions in both parameter count and total FLOPs. In OneNet, an encoder block with scale s=2s=2 requires: P1D_block=(4C2C)+(2C2C)=12C2P_{1D\_block} = (4C \cdot 2C) + (2C \cdot 2C) = 12C^2 In contrast, a typical 2D block necessitates: P2D_block=54C2P_{2D\_block} = 54C^2 Therefore, each block involves only 22% as many parameters.

Summed across LL layers for C0=64C_0=64 (four layers), OneNet’s encoder-only network uses $16.39$M parameters (47%47\% reduction vs $31.04$M for U-Net), while a fully 1D encoder–decoder variant ("OneNet_{ed,4}") reduces further to $9.08$M (71%71\% reduction).

In terms of FLOPs, for 256×256256\times256 images:

  • U-Net₄: $104.7$ GFLOPs
  • OneNetₑ,₄: $78.4$ GFLOPs (25%-25\%)
  • OneNet₍ₑd₎,₄: $22.9$ GFLOPs (78%-78\%)

AugUNet1D (Sengupta et al., 1 Jan 2026) does not quantify FLOPs or parameter count directly in the dataset, but model compression and acceleration are plausible implications.

4. Application Domains and Implementation Paradigms

The channel-wise 1D UNet paradigm is adaptable to a variety of settings:

  • Image Segmentation: OneNet applies channel-wise 1D convs to images, leveraging pixel-repositioning for multi-scale spatial context extraction, and preserves segmentation accuracy relative to standard U-Net architectures (Byun et al., 2024).
  • Time Series Event Segmentation: AugUNet1D detects spike wave discharges in continuous mouse EEG recordings through windowed 1D convolutions and residual topology, achieving state-of-the-art event-level segmentation (Sengupta et al., 1 Jan 2026).
  • Generalization to Other Structured Data: The architecture is applicable to any modality with separable spatial or sequential structure, including audio spectrograms and other dense prediction tasks.

For implementation, OneNet provides PyTorch-style pseudocode for encoder blocks, demonstrating the integration of PixelUnshuffle and Conv1d layers to flatten spatial dimensions into a “time” axis, followed by channel-wise and optional spatial mixing convolutions.

5. Performance Benchmarks and Comparative Evaluation

Empirical results extracted from benchmark studies:

Method VOC mIoU PET_F mIoU PET_S mIoU Heart mIoU Brain mIoU Lung mIoU
U-Net₄ 0.182 0.316 0.713 0.063 0.001 0.009
ResNet₃₄-U-Net 0.332 0.597 0.801 0.065 0.079 0.009
MobileNet-U-Net 0.166 0.252 0.664 0.047 0.011 0.008
OneNetₑ,₄ 0.160 0.216 0.636 0.066 0.105 0.009
OneNet₍ₑd₎,₄ 0.149 0.172 0.535 0.062 0.099 0.008

Encoder-only OneNet achieves 47% parameter reduction with ≤ 1% drop on medical imaging tasks, and 10–15% on general-purpose masks. The full encoder–decoder reduces model size by 71%, with a further 5–10% mIoU drop on more complex scenes.

  • Point-wise F1-score:
    • Vanilla 1D UNet: 0.59 ± 0.15
    • 1D Residual UNet: 0.4268 (no augmentation)
    • AugUNet1D: 0.90 ± 0.01
    • Twin Peaks: 0.69 ± 0.00
  • Event-level F1-score:
    • AugUNet1D: 0.95 ± 0.04
    • Twin Peaks: 0.71 ± 0.15

Ablation studies confirm amplitude scaling as the most effective augmentation in enhancing cross-subject generalization, with scaling-only achieving F1 = $0.8609$. Combining scaling, noise, and inversion achieves the best aggregate metric (F1 = $0.8848$).

6. Data Augmentation, Training Methodologies, and Robustness

AugUNet1D (Sengupta et al., 1 Jan 2026) uses targeted data augmentation:

  • Amplitude scaling: αU(0.5,1.5)\alpha \sim U(0.5, 1.5), p=0.5p=0.5
  • Gaussian noise: εN(0,σ2)\varepsilon \sim \mathcal{N}(0, \sigma^2), p=0.5p=0.5
  • Signal inversion: p=0.2p=0.2

The model is trained per-window (T=2000 timepoints) on Dice loss: LDice=12t=1Ty^tytt=1Ty^t+t=1Tyt+ϵ\mathcal{L}_{Dice} = 1 - \frac{2\sum_{t=1}^{T}\hat y_t y_t}{\sum_{t=1}^{T}\hat y_t + \sum_{t=1}^{T} y_t + \epsilon} with Adam optimization, early stopping, and a cosine-annealing learning rate schedule. Validation is performed on held-out mouse recordings, with results averaged over three runs.

Robustness to reduced training set size is empirically validated: AugUNet1D achieves F1 = $0.8192$ with 5% of labeled data and $0.8618$ with 90%, suggesting effective utilization of augmentation and strong generalization.

7. Significance, Applicability, and Future Implications

The transition from 2D to 1D convolutions in UNet architectures enables practical edge deployment of segmentation models, thanks to drastic reductions in parameters and computational demands—up to 71% model size reduction and 78% FLOPs decrease as shown in OneNet (Byun et al., 2024). In time-series event segmentation, the residual 1D UNet (AugUNet1D) attains state-of-the-art performance in SWD detection with precise temporal localization, outperforming both classical ML and time–frequency algorithms (Sengupta et al., 1 Jan 2026).

A plausible implication is that further exploration of 1D UNet variants, augmented by pixel reorganization techniques and robust data augmentation schemes, may extend their utility across other structured modalities and downstream tasks—provided strict adherence to architectural principles that exploit separable spatial or temporal dependencies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to 1D UNet.