MeliusNet: Efficient BNN for Mobile Devices

Updated 22 June 2026

MeliusNet is a binary neural network that interleaves DenseBlocks with ImprovementBlocks to enhance feature quality while preserving efficiency.
It employs 1-bit quantization in key convolutional layers to dramatically reduce model size and computation, enabling effective hardware acceleration.
Empirical results on ImageNet demonstrate that MeliusNet variants match or exceed the accuracy of comparable 32-bit models like MobileNet with fewer operations.

MeliusNet is a binary neural network (BNN) architecture designed to close the accuracy gap between highly efficient 1-bit quantized networks and compact 32-bit architectures such as MobileNet-v1, particularly for inference on resource-constrained devices. Developed by Bethge et al. (HPI & Alibaba AI Labs), MeliusNet interleaves capacity-increasing DenseBlocks with a novel residual-like ImprovementBlock, providing improvements in feature representation quality without incurring the operational overhead typical of hybrid or multi-binary schemes. The architecture supports training and inference with binary weights and activations throughout most layers, resulting in significant reductions in model size and computation, as empirically validated on the ImageNet ILSVRC2012 benchmark (Bethge et al., 2020).

1. Core Architectural Components

MeliusNet is composed of repeated alternations between two primary blocks:

DenseBlock: Starting from a binary feature-map of $c$ channels (post-BatchNorm, real-valued activations), a single $3\times3$ 1-bit convolution generates $k=64$ new channels. The output is a concatenation of the input and $k$ new channels, resulting in $c+k$ channels. This mechanism increases network capacity with minimal real-valued computation.
ImprovementBlock: Operating on a feature-map of $c+k$ channels (the output of a DenseBlock), a $3\times3$ 1-bit convolution again produces $k$ channels. These outputs are added in a residual fashion to the last $k$ channels of the input—the preceding $c$ channels propagate unchanged—targeting the enhancement of newly concatenated features’ quality.

A MeliusNet stage is constructed by repeating the [DenseBlock → ImprovementBlock] pattern $3\times3$ 0 times. Transition layers between stages apply spatial downsampling using max-pooling and reduce channel dimensions by grouped $3\times3$ 1 real-valued convolutions. For smaller models, further optimization leverages $3\times3$ 2- or $3\times3$ 3-grouped $3\times3$ 4 transitions and channel-shuffling to reduce operational cost.

2. Binary Quantization and Training

All $3\times3$ 5 convolutions use binary quantization for both weights $3\times3$ 6 and activations $3\times3$ 7. Forward quantization is defined as:

$3\times3$ 8

where $3\times3$ 9 is a real-valued variable. The backward pass uses a straight-through estimator (STE) with clipping to manage gradient flow during binarization:

$k=64$ 0

No per-channel or per-layer scaling factors are applied; empirical findings reported no benefit under BatchNorm. The first $k=64$ 1 convolution (real), all $k=64$ 2 downsamplings, and the final fully-connected layer remain non-binarized.

3. Efficient Binary Convolution

MeliusNet’s binary convolution is implemented as bit-wise operations conducive to direct hardware acceleration:

$k=64$ 3

where $k=64$ 4 and $k=64$ 5 are constrained to $k=64$ 6. Hardware implementations use $k=64$ 7 and $k=64$ 8 instructions:

$k=64$ 9

The resulting count can be re-centered by a simple affine transformation.

4. Model Variants and Specification

MeliusNet is scalable across several model sizes. The table summarizes four representative variants in terms of stage configuration, operations ( $k$ 0), model size, and ImageNet accuracy. $k$ 1 (binary operations) are combined with $k$ 2 (real floating-point operations) via a $k$ 3 weighting, consistent with BNN benchmarks.

Model	Block Repeats per Stage	OPs ( $k$ 4)	Size (MB)	Top-1 / Top-5 Accuracy
MeliusNet-22	(4, 5, 4, 4)	2.08	3.9	63.6% / 84.7%
MeliusNet-29	(4, 6, 8, 6)	2.14	5.1	65.8% / 86.2%
MeliusNet-42	(5, 8, 14, 10)	3.25	10.1	69.2% / 88.3%
MeliusNet-59	(6, 12, 24, 12)	5.25	17.4	71.0% / 89.7%

Channel counts are kept as multiples of $k$ 5 through grouped transition layers.

5. Training Methodology and Optimization

MeliusNet is trained from scratch on ILSVRC2012 using:

Data augmentation: Random resized cropping to $k$ 6, horizontal flip with $k$ 7, mean/std normalization.
Initialization: Glorot (Xavier) uniform.
Optimizer: RAdam ( $k$ 8, $k$ 9, weight decay $c+k$ 0).
Learning rate schedule: Cosine annealing from $c+k$ 1, with linear warmup over the first 5 epochs.

$c+k$ 2
Batch Size: $c+k$ 3 images per GPU; total batch size depends on available GPUs.
Epochs: 120; loss: cross-entropy.
Exempt real-valued layers: First $c+k$ 4 stem, all $c+k$ 5 transitions, final output layer.

6. Empirical Performance on ImageNet

MeliusNet achieves competitive or superior accuracy relative to both quantized and real-valued compact networks, under comparable constraints on operations and memory.

Cross-domain results against MobileNet-v1:

MobileNet-v1 0.5: $c+k$ 6 OPs, 4.7 MB, 63.7%
MeliusNet-C (~0.5x width): $c+k$ 7 OPs, 4.5 MB, 64.1% (+0.4)
MobileNet-v1 0.75: $c+k$ 8 OPs, 10 MB, 68.4%
MeliusNet-42: $c+k$ 9 OPs, 10 MB, 69.2% (+0.8)
MobileNet-v1 1.0: $c+k$ 0 OPs, 17 MB, 70.6%
MeliusNet-59: $c+k$ 1 OPs, 17 MB, 71.0% (+0.4)

Versus other BNN/quantized baselines (similar OPs/sizes):

Bi-RealNet 18 (grouped stem): $c+k$ 2 OPs, ~4 MB, 60.6%
Bi-RealNet 34 (grouped stem): $c+k$ 3 OPs, ~5 MB, 63.7%
BinaryDenseNet 28: $c+k$ 4 OPs, 4.5 MB, 62.6%
BinaryDenseNet 37: $c+k$ 5 OPs, 5.1 MB, 64.2%
MeliusNet A (4,5,5,6)/4g: $c+k$ 6 OPs, 4.0 MB, 63.4%
MeliusNet B (4,6,8,6)/2g: $c+k$ 7 OPs, 5.0 MB, 65.7%

Quantized baselines with 1–2 bit activations and 2–32 bit weights typically require either much larger OPs or model sizes to match similar accuracy.

7. Analysis of Architectural Advantages

The MeliusNet design strategically addresses the primary challenge of BNNs—recovering representational power lost through binarization—while keeping computation balanced and compatible with highly efficient hardware implementations.

Capacity versus quality: DenseBlocks increase binary channel count; subsequent ImprovementBlocks selectively enhance just the new channels, limiting residual diffusion across previous features and improving representational quality.
Balanced binary computation: Alternation of DenseBlock and ImprovementBlock pairs maintains roughly constant bitwise operation counts per layer, thereby avoiding the quadratic operation growth seen in naive Dense+ResNet hybrids.
Efficient use of real-valued layers: Bottlenecked, grouped real-valued convolutions (stem and transitions) reduce non-binary operation share by approximately 40% compared to prior BNNs, wherein 32-bit computation could account for up to 75% of operations.
Optimization methodology: Training stability and convergence are significantly enhanced by RAdam with annealing, batch normalization, and STE gradient clipping, outperforming standard SGD baselines.

MeliusNet demonstrates that architectural innovations—rather than wider or deeper channel expansions or complex multi-binary basis approaches—can enable 1-bit-per-weight-and-activation networks to match or exceed the accuracy of leading compact 32-bit models on large-scale vision tasks (Bethge et al., 2020).

Markdown Report Issue Upgrade to Chat

References (1)

MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy? (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MeliusNet.