Papers
Topics
Authors
Recent
Search
2000 character limit reached

RBNN: Rotated Binary Neural Network

Updated 17 March 2026
  • Rotated Binary Neural Networks (RBNN) are quantized frameworks that leverage learned orthogonal rotations and adjustable interpolation to reduce angular bias and quantization error in 1-bit neural networks.
  • They overcome standard BNN limitations by aligning full-precision weights with binary counterparts, thereby maximizing weight flip rates and enhancing representational capacity.
  • Empirical evaluations on benchmarks such as CIFAR-10 and ImageNet demonstrate that RBNN consistently outperforms existing state-of-the-art methods in classification accuracy.

Rotated Binary Neural Network (RBNN) is a quantized deep learning framework that addresses the limitations of conventional Binary Neural Networks (BNNs), especially the persistent quantization error induced by angular bias. RBNN systematically aligns the full-precision weight vectors with their binary counterparts through learned orthogonal rotations, coupled with an adjustable interpolation mechanism and a training-aware sign approximation. This architecture advances the representational capacity and empirical accuracy of 1-bit neural networks, consistently outperforming state-of-the-art BNNs on large-scale classification benchmarks (Lin et al., 2020).

1. Limitations of Standard Binary Neural Networks

Binary Neural Networks binarize the weight vector wRnw\in\mathbb{R}^n by applying an elementwise sign function, producing b=sign(w){1,+1}nb = \operatorname{sign}(w)\in\{-1, +1\}^n. To compensate for the loss of norm, a scaling factor α\alpha is often learned, minimizing the error E=wαb22E = \|w - \alpha b\|_2^2. However, this approach cannot address the angular discrepancy between ww and bb. The resulting cosine similarity is cosθ=(wb)/(w2b2)\cos\theta = (w^\top b)/(\|w\|_2\|b\|_2), yielding a lower bound on the quantization error:

minαwαb22wsinθ22\min_\alpha \|w - \alpha b\|_2^2 \geq \|w\sin\theta\|_2^2

Thus, a large angular bias θ\theta imposes an irreducible error. Standard BNNs display both a substantial norm gap and angular misalignment, leading to a marked decline in classification accuracy. Furthermore, the binarization process in these models results in a low “weight flip” rate—empirically 5–10%—which severely under-utilizes the full combinatorial capacity of the weight space, limiting information gain (Lin et al., 2020).

2. Rotation-Based Angular Alignment

RBNN addresses angular bias by seeking an orthogonal matrix RRn×nR\in\mathbb{R}^{n\times n} (RR=IR^\top R = I) that rotates ww such that RwRw is closely aligned with a binary vector b{1,+1}nb\in\{-1,+1\}^n. The optimization problem is posed as:

minR,bRwb22subject toRR=I,b{1,+1}n\min_{R, b} \|Rw - b\|_2^2 \quad\textrm{subject to}\quad R^\top R = I,\, b\in\{-1,+1\}^n

Maximizing the cosine similarity between RwRw and bb is equivalent to maximizing

cosϕ=ηtrace(bwR),η=1/(nw2)\cos\phi = \eta\,\operatorname{trace}(b w^\top R),\quad \eta=1/(\sqrt{n}\|w\|_2)

Direct optimization of a dense RR is computationally prohibitive (O(n2)\mathcal{O}(n^2) time and storage). RBNN circumvents this by factorizing RR as a Kronecker product of two smaller orthogonal matrices, R1Rn1×n1R_1\in\mathbb{R}^{n_1\times n_1} and R2Rn2×n2R_2\in\mathbb{R}^{n_2\times n_2}, where n=n1n2n = n_1 n_2:

R=R1R2R = R_1\otimes R_2

This “bi-rotation” formulation substantially reduces optimization complexity while retaining expressive rotational flexibility.

The corresponding optimization alternates between:

  • B-step: Update BW=sign(R1WR2)B_{W'} = \operatorname{sign}(R_1^\top W R_2).
  • R1R_1-step: SVD-based update maximizing trace(G1R1)\operatorname{trace}(G_1 R_1) with G1=BW(R2)WG_1 = B_{W'} (R_2)^\top W^\top.
  • R2R_2-step: SVD-based update maximizing trace(G2R2)\operatorname{trace}(G_2 R_2) with G2=WR1BWG_2 = W^\top R_1 B_{W'}.

Typically, three cycles of alternating minimization suffice per epoch.

3. Adjustable Rotated Weight Vector

Merely rotating ww may not ensure optimal binarization, as RwR^\top w may over- or under-shoot the desired binary vertex. RBNN interposes an adjustable interpolation between ww and RwR^\top w:

w~=(1α)w+α(Rw)\tilde{w} = (1-\alpha) w + \alpha (R^\top w)

Here, α[0,1]\alpha\in[0,1] is a learnable parameter scheduled by back-propagation (specifically, α=sinβ\alpha = |\sin\beta| with β\beta trained), starting near zero in early training and increasing to one, which mitigates local minima and enables dynamically controlled rotation exploitation. This mechanism improves training convergence and further reduces quantization error.

4. Training-Aware Sign Function Approximation

The sign function’s derivative is zero almost everywhere, impeding gradient propagation. To address this, RBNN introduces a smooth, epoch-dependent proxy function F(x)F(x) for use in the forward pass:

F(x)={k[sign(x)t2x22+2tx]x<2/t ksign(x)otherwiseF(x) = \begin{cases} k \cdot [ -\operatorname{sign}(x)\frac{t^2 x^2}{2} + \sqrt{2} t x ] & |x| < \sqrt{2}/t \ k \cdot \operatorname{sign}(x) & \text{otherwise} \end{cases}

where the sharpness parameter tt evolves from Tmin=2T_\text{min}=-2 to Tmax=1T_\text{max}=1 over training, ee is the current epoch, EE is the total number of epochs, and k=max(1/t,1)k = \max(1/t, 1). The gradient for backpropagation is

F(x)=max(k(2tt2x),0)F'(x) = \max(k(\sqrt{2}t - |t^2 x|), 0)

As training progresses, F(x)sign(x)F(x)\to \operatorname{sign}(x), ensuring binary-valued weights. The chain rule is used to propagate gradients through both ww and α\alpha: \begin{align*} \partial\mathcal{L}/\partial w &= \partial\mathcal{L}/\partial F(\tilde{w}) \cdot F'(\tilde{w}) \cdot [ (1-\alpha)I + \alpha R\top ] \ \partial\mathcal{L}/\partial \alpha &= \sum_j [ (\partial\mathcal{L}/\partial F(\tilde{w}))_j \cdot F'(\tilde{w})_j ( (R\top w - w)_j ) ] \end{align*}

5. Training Procedure and Algorithmic Outline

The RBNN training process alternates between rotation optimization and weight interpolation:

  1. Epoch-level Rotation: For each layer ii at the start of epoch ee:
    • Freeze wiw^i.
    • Run three cycles of the alternating update steps for BB, R1R_1 and R2R_2.
    • Compose Ri=R1iR2iR^i = R_1^i \otimes R_2^i.
  2. Mini-batch Processing: For each mini-batch:
    • Compute w~i=(1αi)wi+αi(Ri)wi\tilde{w}^i = (1-\alpha^i) w^i + \alpha^i (R^i)^\top w^i per layer.
    • Forward: binarize using F()F(\cdot); compute convolutions via XNOR/bitcount.
    • Backward: compute gradients using F()F'(\cdot); update wiw^i and βi\beta^i (αi\alpha^i).

Notably, convolutional and fully connected layers (excluding the first and last) are binarized with $1$-bit weights/activations in all reported experiments.

6. Empirical Evaluation

RBNN demonstrates consistent accuracy improvements over contemporary BNN approaches on standard benchmarks. Key results are summarized below.

CIFAR-10 Top-1 Accuracy

Network Method W/A Accuracy
ResNet-18 FP (32/32) 32/32 93.0%
RAD 1/1 90.5%
IR-Net 1/1 91.5%
RBNN 1/1 92.2%
ResNet-20 FP (32/32) 32/32 91.7%
DoReFa 1/1 79.3%
DSQ 1/1 84.1%
IR-Net 1/1 85.4%
RBNN 1/1 86.5%
IR-Net* (Bi-Real) 1/1 86.5%
RBNN* (Bi-Real) 1/1 87.8%
VGG-small FP (32/32) 32/32 91.7%
LAB 1/1 87.7%
XNOR-Net 1/1 89.8%
BNN 1/1 89.9%
RAD 1/1 90.0%
IR-Net 1/1 90.4%
RBNN 1/1 91.3%

ImageNet Top-1 / Top-5 Accuracy

Network Method W/A Top-1 Top-5
ResNet-18 FP (32/32) 32/32 69.6% 89.2%
ABC-Net 1/1 42.7% 67.6%
XNOR-Net 1/1 51.2% 73.2%
BNN+ 1/1 53.0% 72.6%
DoReFa 1/2 53.4%
Bi-Real 1/1 56.4% 79.5%
XNOR++ 1/1 57.1% 79.9%
IR-Net 1/1 58.1% 80.0%
RBNN 1/1 59.9% 81.9%
ResNet-34 FP (32/32) 32/32 73.3% 91.3%
ABC-Net 1/1 52.4% 76.5%
Bi-Real 1/1 62.2% 83.9%
IR-Net 1/1 62.9% 84.1%
RBNN 1/1 63.1% 84.4%

Compared to XNOR-Net and IR-Net, RBNN improves top-1 ImageNet accuracy by approximately 1.8% and 1.1%, respectively, with analogous gains on CIFAR-10.

RBNN also produces significantly higher weight flip rates—about 50% per layer—maximizing the entropy and fully exploiting the 2n2^n-state binary capacity, as opposed to 5–10% for standard BNNs. The reduction in angular bias and quantization error is evident per layer, and RBNN weight distributions exhibit pronounced bimodality at ±1\pm1.

7. Implementation and Reproducibility

RBNN is implemented in PyTorch. Principal modules include:

  • rotation.py (bi-rotation optimization),
  • adjustable_weight.py (computing w~\tilde{w} and α\alpha scheduling),
  • sign_approx.py (forward F(x)F(x) and backward F(x)F'(x)),
  • training.py (overall training workflow).

Reproducibility is ensured through fixed random seeds and configuration files specifying hyperparameter schedules (including learning rate, tt, and kk). Pre-trained full-precision models and all source artifacts are available at https://github.com/lmbxmu/RBNN. The training setups involve SGD with momentum (0.9), weight decay (5×1045\times10^{-4}), and standardized schedules for learning rate and batch size (Lin et al., 2020).

Distinctively, RBNN rotates and binarizes all convolutional and fully connected layers except the network’s first and last, following prevailing best practices for 1-bit DNN quantization.


RBNN systematically addresses quantization error in binarized neural networks through angular alignment, maximized weight flips, and an adjustable binarization surrogate. The empirical evaluation confirms notable accuracy advantages and higher information capacity compared to existing 1-bit quantization methods (Lin et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Rotated Binary Neural Network (RBNN).