Papers
Topics
Authors
Recent
Search
2000 character limit reached

MeliusNet: Efficient BNN for Mobile Devices

Updated 22 June 2026
  • MeliusNet is a binary neural network that interleaves DenseBlocks with ImprovementBlocks to enhance feature quality while preserving efficiency.
  • It employs 1-bit quantization in key convolutional layers to dramatically reduce model size and computation, enabling effective hardware acceleration.
  • Empirical results on ImageNet demonstrate that MeliusNet variants match or exceed the accuracy of comparable 32-bit models like MobileNet with fewer operations.

MeliusNet is a binary neural network (BNN) architecture designed to close the accuracy gap between highly efficient 1-bit quantized networks and compact 32-bit architectures such as MobileNet-v1, particularly for inference on resource-constrained devices. Developed by Bethge et al. (HPI & Alibaba AI Labs), MeliusNet interleaves capacity-increasing DenseBlocks with a novel residual-like ImprovementBlock, providing improvements in feature representation quality without incurring the operational overhead typical of hybrid or multi-binary schemes. The architecture supports training and inference with binary weights and activations throughout most layers, resulting in significant reductions in model size and computation, as empirically validated on the ImageNet ILSVRC2012 benchmark (Bethge et al., 2020).

1. Core Architectural Components

MeliusNet is composed of repeated alternations between two primary blocks:

  • DenseBlock: Starting from a binary feature-map of cc channels (post-BatchNorm, real-valued activations), a single 3Ă—33\times3 1-bit convolution generates k=64k=64 new channels. The output is a concatenation of the input and kk new channels, resulting in c+kc+k channels. This mechanism increases network capacity with minimal real-valued computation.
  • ImprovementBlock: Operating on a feature-map of c+kc+k channels (the output of a DenseBlock), a 3Ă—33\times3 1-bit convolution again produces kk channels. These outputs are added in a residual fashion to the last kk channels of the input—the preceding cc channels propagate unchanged—targeting the enhancement of newly concatenated features’ quality.

A MeliusNet stage is constructed by repeating the [DenseBlock → ImprovementBlock] pattern 3×33\times30 times. Transition layers between stages apply spatial downsampling using max-pooling and reduce channel dimensions by grouped 3×33\times31 real-valued convolutions. For smaller models, further optimization leverages 3×33\times32- or 3×33\times33-grouped 3×33\times34 transitions and channel-shuffling to reduce operational cost.

2. Binary Quantization and Training

All 3Ă—33\times35 convolutions use binary quantization for both weights 3Ă—33\times36 and activations 3Ă—33\times37. Forward quantization is defined as:

3Ă—33\times38

where 3Ă—33\times39 is a real-valued variable. The backward pass uses a straight-through estimator (STE) with clipping to manage gradient flow during binarization:

k=64k=640

No per-channel or per-layer scaling factors are applied; empirical findings reported no benefit under BatchNorm. The first k=64k=641 convolution (real), all k=64k=642 downsamplings, and the final fully-connected layer remain non-binarized.

3. Efficient Binary Convolution

MeliusNet’s binary convolution is implemented as bit-wise operations conducive to direct hardware acceleration:

k=64k=643

where k=64k=644 and k=64k=645 are constrained to k=64k=646. Hardware implementations use k=64k=647 and k=64k=648 instructions:

k=64k=649

The resulting count can be re-centered by a simple affine transformation.

4. Model Variants and Specification

MeliusNet is scalable across several model sizes. The table summarizes four representative variants in terms of stage configuration, operations (kk0), model size, and ImageNet accuracy. kk1 (binary operations) are combined with kk2 (real floating-point operations) via a kk3 weighting, consistent with BNN benchmarks.

Model Block Repeats per Stage OPs (kk4) Size (MB) Top-1 / Top-5 Accuracy
MeliusNet-22 (4, 5, 4, 4) 2.08 3.9 63.6% / 84.7%
MeliusNet-29 (4, 6, 8, 6) 2.14 5.1 65.8% / 86.2%
MeliusNet-42 (5, 8, 14, 10) 3.25 10.1 69.2% / 88.3%
MeliusNet-59 (6, 12, 24, 12) 5.25 17.4 71.0% / 89.7%

Channel counts are kept as multiples of kk5 through grouped transition layers.

5. Training Methodology and Optimization

MeliusNet is trained from scratch on ILSVRC2012 using:

  • Data augmentation: Random resized cropping to kk6, horizontal flip with kk7, mean/std normalization.
  • Initialization: Glorot (Xavier) uniform.
  • Optimizer: RAdam (kk8, kk9, weight decay c+kc+k0).
  • Learning rate schedule: Cosine annealing from c+kc+k1, with linear warmup over the first 5 epochs.

    c+kc+k2

  • Batch Size: c+kc+k3 images per GPU; total batch size depends on available GPUs.
  • Epochs: 120; loss: cross-entropy.
  • Exempt real-valued layers: First c+kc+k4 stem, all c+kc+k5 transitions, final output layer.

6. Empirical Performance on ImageNet

MeliusNet achieves competitive or superior accuracy relative to both quantized and real-valued compact networks, under comparable constraints on operations and memory.

Cross-domain results against MobileNet-v1:

  • MobileNet-v1 0.5: c+kc+k6 OPs, 4.7 MB, 63.7%
  • MeliusNet-C (~0.5x width): c+kc+k7 OPs, 4.5 MB, 64.1% (+0.4)
  • MobileNet-v1 0.75: c+kc+k8 OPs, 10 MB, 68.4%
  • MeliusNet-42: c+kc+k9 OPs, 10 MB, 69.2% (+0.8)
  • MobileNet-v1 1.0: c+kc+k0 OPs, 17 MB, 70.6%
  • MeliusNet-59: c+kc+k1 OPs, 17 MB, 71.0% (+0.4)

Versus other BNN/quantized baselines (similar OPs/sizes):

  • Bi-RealNet 18 (grouped stem): c+kc+k2 OPs, ~4 MB, 60.6%
  • Bi-RealNet 34 (grouped stem): c+kc+k3 OPs, ~5 MB, 63.7%
  • BinaryDenseNet 28: c+kc+k4 OPs, 4.5 MB, 62.6%
  • BinaryDenseNet 37: c+kc+k5 OPs, 5.1 MB, 64.2%
  • MeliusNet A (4,5,5,6)/4g: c+kc+k6 OPs, 4.0 MB, 63.4%
  • MeliusNet B (4,6,8,6)/2g: c+kc+k7 OPs, 5.0 MB, 65.7%

Quantized baselines with 1–2 bit activations and 2–32 bit weights typically require either much larger OPs or model sizes to match similar accuracy.

7. Analysis of Architectural Advantages

The MeliusNet design strategically addresses the primary challenge of BNNs—recovering representational power lost through binarization—while keeping computation balanced and compatible with highly efficient hardware implementations.

  • Capacity versus quality: DenseBlocks increase binary channel count; subsequent ImprovementBlocks selectively enhance just the new channels, limiting residual diffusion across previous features and improving representational quality.
  • Balanced binary computation: Alternation of DenseBlock and ImprovementBlock pairs maintains roughly constant bitwise operation counts per layer, thereby avoiding the quadratic operation growth seen in naive Dense+ResNet hybrids.
  • Efficient use of real-valued layers: Bottlenecked, grouped real-valued convolutions (stem and transitions) reduce non-binary operation share by approximately 40% compared to prior BNNs, wherein 32-bit computation could account for up to 75% of operations.
  • Optimization methodology: Training stability and convergence are significantly enhanced by RAdam with annealing, batch normalization, and STE gradient clipping, outperforming standard SGD baselines.

MeliusNet demonstrates that architectural innovations—rather than wider or deeper channel expansions or complex multi-binary basis approaches—can enable 1-bit-per-weight-and-activation networks to match or exceed the accuracy of leading compact 32-bit models on large-scale vision tasks (Bethge et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MeliusNet.