Papers
Topics
Authors
Recent
Search
2000 character limit reached

Efficient Upsampling Convolution Block (EUCB)

Updated 23 January 2026
  • EUCB is a family of neural modules that efficiently upsamples feature maps using a sequence of operations to preserve edge and texture details.
  • It combines 2× upsampling, 3×3 depthwise convolution with batch normalization and ReLU, followed by 1×1 pointwise convolution for optimal feature refinement.
  • Applications in object detection and video coding show improved mAP and bitrate reductions by replacing traditional interpolation methods with EUCB.

The Efficient Upsampling Convolution Block (EUCB) denotes a family of computationally efficient neural network modules designed to upsample intermediate feature maps while enhancing edge and texture fidelity, primarily within deep learning pipelines for image analysis and compression. Originating in contexts such as visual object detection (Han, 16 Jan 2026) and video coding (Li et al., 2017), EUCB architectures aim to reduce the artifacts of naive interpolation while imposing modest computational and parameter overhead.

1. Architectural Principles of EUCB

The core EUCB architecture, as instantiated in SME-YOLO (Han, 16 Jan 2026), consists of a sequence of four operations applied to an input feature map FinRC×H×WF_{in}\in\mathbb{R}^{C\times H\times W}:

  1. 2× Spatial Upsampling: F(1)=2FinF^{(1)} = \uparrow_2 F_{in}, typically using bilinear or nearest-neighbor interpolation, with negligible computational cost.
  2. 3×3 Depthwise Convolution: For each channel,

Fc,u,v(2)=i=11j=11wc,i,jdwFc,u+i,v+j(1)F^{(2)}_{c,u,v} = \sum_{i=-1}^{1}\sum_{j=-1}^{1} w^{\mathrm{dw}}_{c,i,j}\, F^{(1)}_{c,u+i,v+j}

(Stride=1, Padding=1).

  1. Batch Normalization and ReLU: F(3)=ReLU(BN(F(2)))F^{(3)} = \text{ReLU}(\text{BN}(F^{(2)})).
  2. 1×1 Pointwise Convolution: Channel reduction or mixing,

Fout,m,u,v=c=1Cwm,cpwFc,u,v(3)F_{out,m,u,v} = \sum_{c=1}^C w^{\mathrm{pw}}_{m,c} F^{(3)}_{c,u,v}

mapping CC input channels to CC' output channels as needed.

This yields the compact formula:

Fout=Conv1×1(ReLU(BN(DWC3×3(2Fin))))F_{out} = \text{Conv}_{1\times1}\bigl(\text{ReLU}(\text{BN}(\text{DWC}_{3\times3}(\uparrow_2 F_{in})))\bigr)

In CNN-based video coding (Li et al., 2017), EUCB functionality is realized in a deeper up-sampling block comprising feature extraction, explicit multi-scale fusion, a 2× transposed convolution (deconvolution), and residual prediction, but the central aim remains efficient, high-fidelity upsampling.

2. Layerwise Specification and Variants

A detailed layerwise specification for SME-YOLO's EUCB is summarized below:

Layer Operation Kernel Stride Padding Activation/Norm
Up Upsample (×2)
DWC₃×₃ Depthwise conv 3×3 3×3 1 1 BatchNorm → ReLU
Conv₁×₁ Pointwise conv 1×1 1×1 1 0

In comparison, the five-layer upsampling block in (Li et al., 2017) incorporates:

  • Multi-scale feature extraction and reconstruction (3×3 and 5×5 branches with concatenation).
  • Learned deconvolution (9×9, stride 2) for spatial upsampling.
  • Final prediction via 5×5 kernel without activation.
  • Residual learning, where the CNN predicts the difference between DCT-based fixed upsampling and ground truth.

3. Feature Fusion and Information Flow

SME-YOLO's EUCB is a strictly sequential, single-path module, performing no explicit feature fusion across spatial scales within the block. Edge and texture details are enhanced via local spatial filtering (depthwise convolution) immediately after upsampling. In contrast, the design in (Li et al., 2017) employs explicit multi-scale fusion: parallel convolutions with heterogeneous receptive fields in two stages, their outputs concatenated before further processing, yielding richer context aggregation for restoring lost high-frequency components.

Importantly, neither approach utilizes attention or gating within the EUCB itself; spatial or channel attention, where employed, is delegated to surrounding modules.

4. Computational Efficiency and Parameterization

The efficiency of EUCB in SME-YOLO is analytically characterized as follows. Let CC and CC' denote input/output channel counts, HH' and WW' the upsampled feature dimensions.

  • Depthwise 3×3 conv: 2HW9C\approx 2H'W'\cdot 9C FLOPs
  • Pointwise 1×1 conv: 2HWCC\approx 2H'W'\cdot C\cdot C' FLOPs
  • Upsample: negligible cost

Total parameters: $9C$ (depthwise) +CC+\, C\cdot C' (pointwise).

For typical object detection backbones, ablation results indicate that replacing the traditional bilinear upsample ++ Conv1×1 sequence with the EUCB in SME-YOLO increases model GFLOPs by \sim1.0 (6.3→7.3) (Han, 16 Jan 2026).

In the video-coding variant, the five-layer module employs \approx474K parameters, predominantly from the 9×9 deconvolution operation, and achieves \sim4× speedup versus deeper super-resolution baselines (e.g., VDSR), while incurring significantly increased decoding computational cost compared to non-CNN HEVC anchors (Li et al., 2017).

5. Enhancement of Edge and Texture Fidelity

The use of learned convolutional filtering in the EUCB, especially the immediate placement of depthwise 3×3 convolutions after upsampling, has demonstrable effects on detail preservation. Empirical ablation in SME-YOLO shows that stacking EUCB on top of specialized loss functions (NWDLoss) yields an [email protected] increase from 0.939 to 0.946 and visibly sharper defect contours for tiny PCB defects (see qualitative comparisons in Fig. 8 of (Han, 16 Jan 2026)). The refinement effect is attributed to improved representation of boundary details, which are otherwise smoothed or lost under pure interpolation.

Similarly, in intra-frame block upsampling (Li et al., 2017), the deep CNN-based module, with its multi-scale kernels and residual learning, outperforms both classical interpolation and deeper networks (VDSR) for intra-block texture reconstruction.

6. Integration Contexts and Detection/Compression Impact

In SME-YOLO, EUCB replaces every bilinear upsampling \rightarrow feature-merge step within the neck (FPN/PAN), thereby enhancing the transfer of high-fidelity features between scales, especially critical for small-object detection. The resulting system, with EUCB and complementary modules (NWDLoss, MSFA), achieves final [email protected]=0.950, Precision=0.970, and Recall=0.910 on the PKU-PCB dataset—constituting state-of-the-art for tiny-defect detection on PCBs (Han, 16 Jan 2026).

Within CNN-based block upsampling for intra-frame coding, the block integrates into the High Efficiency Video Coding (HEVC) pipeline, supporting both luma and chroma upsampling, and deploys a two-stage upsampling (intra-loop, then frame-level refinement) to recover block-edge continuity. The integration enables significant bitrate reductions (on average 5.5% BD-rate reduction overall, 9.0% for UHD) relative to HEVC anchors (Li et al., 2017).

7. Extensions, Variants, and Optimization Strategies

Key architecture-level strategies for efficient upsampling convolution blocks across applications include:

  • Residual learning: CNN predicts high-frequency details over a fixed interpolation baseline (e.g., DCTIF), aiding stability and convergence in block video coding (Li et al., 2017).
  • Multi-scale fusion: Different receptive fields are fused via parallel branch concatenation before up/decoding, preferentially capturing both fine and coarse spatial structures, as in the five-layer block (Li et al., 2017).
  • Two-stage inference: Sequential application during and after coding loop to maximize border and contextual fidelity (video coding context).
  • Rate-distortion optimization: For video coding, block rate-distortion decisions during inference employ empirically scaled distortion/bitrate trade-offs, with adjustments to quantization parameters maintaining global optimality (Li et al., 2017).

A plausible implication is that the incremental complexity introduced by EUCB-type modules is justified when fine-scale feature reconstruction or boundary localization is critical to downstream tasks, especially in regimes dominated by small object or texture features.


References:

  • SME-YOLO: "SME-YOLO: A Real-Time Detector for Tiny Defect Detection on PCB Surfaces" (Han, 16 Jan 2026)
  • CNN Block Upsampling for Intra Frame Coding: "Convolutional Neural Network-Based Block Up-sampling for Intra Frame Coding" (Li et al., 2017)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Efficient Upsampling Convolution Block (EUCB).