Sandglass Block for Efficient CNN Architectures

Updated 23 June 2026

Sandglass block is a CNN module that reorders operations by performing spatial convolutions at high dimensionality before reduction, preserving essential feature information.
It enhances gradient propagation and mitigates information loss relative to inverted residual blocks, leading to measurable gains in classification and detection performance.
Empirical results on benchmarks like ImageNet and Pascal VOC demonstrate that integrating the sandglass block improves accuracy while reducing computational cost.

The sandglass block is a convolutional neural network module designed to address the limitations of the inverted residual block, with particular emphasis on mitigating information loss and gradient confusion. Introduced by Zhou et al. as a novel bottleneck structure for efficient mobile network design, the sandglass block reverses the architectural flow of the inverted residual block, performing identity mapping and spatial transformations at higher-dimensional representations. This structure empirically yields improved classification accuracy and detection performance on major benchmarks (ImageNet, Pascal VOC) while maintaining or reducing computational cost and parameter count (Daquan et al., 2020).

1. Architectural Design and Layer Structure

The sandglass block (SGBlock) fundamentally reorders the conventional inverted residual paradigm found in MobileNetV2. While the inverted residual block compresses features to a low-dimensional bottleneck, applies spatial operations at this bottleneck, and then expands back, the sandglass block maintains high-dimensional representations for both the shortcut and spatial convolutions before the dimensionality reduction step. The detailed layer-by-layer structure is as follows:

Step 1: Depthwise 3×3 convolution, stride 1, input and output channels M, followed by ReLU6 activation.
Step 2: Pointwise 1×1 convolution reducing channels from M to $M/t$ (t = channel-reduction ratio), with linear activation.
Step 3: Pointwise 1×1 convolution expanding channels from $M/t$ to N, followed by ReLU6.
Step 4: Depthwise 3×3 convolution, stride s, output channels N, linear activation.

The shortcut connection is added at the high-dimensional stage (M channels) and only when input and output shapes match (i.e., stride 1 and $M=N$ ). Optionally, an “identity multiplier” α can be introduced to restrict the shortcut to αM channels for computational savings.

The architecture promotes spatial processing at high dimension before bottleneck compression, with explicit mathematical form: $\hat G = \phi_{1,p}(\phi_{1,d}(F)), \qquad G = \phi_{2,d}(\phi_{2,p}(\hat G)) + I(F)$ where $\phi_{i,p}$ and $\phi_{i,d}$ denote pointwise and depthwise convolutions, respectively, and $I(F)$ is the identity mapping (Daquan et al., 2020).

2. Theoretical Underpinnings and Design Rationale

Fundamental to the sandglass block is the hypothesis that shortcuts and major spatial operations in high-dimensional feature space maximize information preservation and improve gradient propagation during training. The inverted residual block’s low-dimensional shortcut risks discarding latent feature information and exacerbating gradient confusion, a phenomenon wherein competing gradient directions degrade convergence.

Key theoretical motivations:

Wide Shortcuts: By enabling the residual connection at the full channel width, more input feature information is propagated.
Enhanced Gradient Flow: With broader residual paths, the backpropagated gradients are dispersed over more channels, dampening adverse effects observed in narrow shortcuts (as described by Sankararaman et al., 2019 and reflected in (Daquan et al., 2020)).
Depthwise Convolution Placement: Placing depthwise convolutions at both ends (high-dimensional space) rather than only at the bottleneck enhances spatial encoding while maintaining efficiency due to their low computational cost.
Linear Activations: Following the principle of linear bottleneck from MobileNetV2, the block applies linear activations after channel reduction and as the final projection, further preventing signal collapse.

This structure enables residual mappings and spatial transformations to be realized in both a computationally efficient and representationally robust manner.

3. Empirical Performance and Benchmarks

Empirical evaluation demonstrates the effectiveness of the sandglass block as a drop-in replacement for the inverted residual block across major tasks:

ImageNet Classification (224×224):
- MobileNetV2-1.0: 3.5M params, 300M MAdds → 72.3% top-1 accuracy
- MobileNeXt-1.0 (with SGBlock): 3.4M params, 300M MAdds → 74.0% top-1 (+1.7%)
- Larger gains for reduced-width models (up to +2.3% at 0.5× width).
Post-Training Quantization:
- MobileNetV2: 65.1%
- MobileNeXt: 68.6% (+3.5%)
Object Detection (Pascal VOC 2007, SSDLite320):
- MobileNetV2-1.0: mAP = 71.7%
- MobileNeXt-1.0: mAP = 72.6% (+0.9%)
Neural Architecture Search (DARTS on CIFAR-10):
- DARTS baseline: error 3.11%, 3.25M params
- + SGBlock: error 2.98%, 2.45M params (−25% params, −0.13% error)

These results indicate systematic improvement in representational efficiency and task performance without parameter or computation overhead (Daquan et al., 2020).

4. Mathematical Formalism and Channel Manipulation

The channel manipulation in the sandglass block is governed by precise formulas:

Channel Reduction: $C_r = M / t$
Channel Expansion: $C_e = N$

Notably, in the classic inverted residual block, the bottleneck is created via an expansion to $C_{mid} = t M$ and a subsequent projection back, whereas the sandglass block reduces then expands. For the shortcut, the optional “identity multiplier” α restricts addition to a subset (first αM channels) to optimize computational overhead.

The criteria for shortcut usage are:

Stride $M/t$ 0 (no spatial downsampling)
Channels match ( $M/t$ 1)

Feature aggregation at high dimension underpins the preservation of signal and robustness to gradient vanishing (Daquan et al., 2020).

5. Implementation Details and Best Practices

Activation Functions: ReLU6 is applied after the initial depthwise and second pointwise convolutions, linear activation after the first pointwise (reduction) and final depthwise convolutions.
Kernel Sizes: 3×3 for depthwise, 1×1 for pointwise convolutions.
Hyperparameters: Default expansion ratio $M/t$ 2 (first block $M/t$ 3), batch size 256, cosine learning-rate schedule (initial LR 0.05), 200 training epochs with SGD (momentum 0.9, weight decay $M/t$ 4).
Shortcut with Identity Multiplier: Tuning $M/t$ 5 allows precise latency/accuracy control by reducing the shortcut's width.
Framework: Models trained in PyTorch.

These guidelines facilitate integration of sandglass blocks into standard training regimes for efficient mobile architectures (Daquan et al., 2020).

6. Comparison with Other Bottleneck Structures

A direct comparison between sandglass and inverted residual blocks illuminates the critical differences:

Property	Inverted Residual Block	Sandglass Block
Shortcut location	Low-dimension (bottleneck)	High-dimension (input)
Main spatial convolution	Depthwise in bottleneck	Depthwise at both ends
Channel manipulation	Expand → DW → Project	DW → Reduce → Expand → DW
Linear activation	On projection (“linear bottleneck”)	On reduce/projection steps
Gradient/Info preservation	Lower (risk of loss/confusion)	Higher (wider path)

This inversion in flow ordering underlies the empirical improvements and motivates the sandglass block’s design over MobileNetV2’s inverted residuals (Daquan et al., 2020).

7. Implications and Integration in Modern Architectures

The sandglass block’s architecture is immediately applicable as a direct replacement for inverted residual blocks in mobile computer vision models, with demonstrated benefits in classification, detection, and neural architecture search tasks. Its principles of wide residual shortcuts and depthwise convolution at high dimensions generalize to other resource-constrained learning settings and offer a template for future bottleneck block design.

A plausible implication is the potential for further optimization of block structure in both mobile and large-scale architectures by prioritizing high-dimensional information flow and spatial transformation prior to dimensionality reduction. Integration in neural architecture search pipelines leverages these advantages to discover compact yet high-performing networks (Daquan et al., 2020).

Markdown Report Issue Upgrade to Chat

References (1)

Rethinking Bottleneck Structure for Efficient Mobile Network Design (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sandglass Block.

Sandglass Block for Efficient CNN Architectures

1. Architectural Design and Layer Structure

2. Theoretical Underpinnings and Design Rationale

3. Empirical Performance and Benchmarks

4. Mathematical Formalism and Channel Manipulation

5. Implementation Details and Best Practices

6. Comparison with Other Bottleneck Structures

7. Implications and Integration in Modern Architectures

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sandglass Block for Efficient CNN Architectures

1. Architectural Design and Layer Structure

2. Theoretical Underpinnings and Design Rationale

3. Empirical Performance and Benchmarks

4. Mathematical Formalism and Channel Manipulation

5. Implementation Details and Best Practices

6. Comparison with Other Bottleneck Structures

7. Implications and Integration in Modern Architectures

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research