RBNN: Rotated Binary Neural Network

Updated 17 March 2026

Rotated Binary Neural Networks (RBNN) are quantized frameworks that leverage learned orthogonal rotations and adjustable interpolation to reduce angular bias and quantization error in 1-bit neural networks.
They overcome standard BNN limitations by aligning full-precision weights with binary counterparts, thereby maximizing weight flip rates and enhancing representational capacity.
Empirical evaluations on benchmarks such as CIFAR-10 and ImageNet demonstrate that RBNN consistently outperforms existing state-of-the-art methods in classification accuracy.

Rotated Binary Neural Network (RBNN) is a quantized deep learning framework that addresses the limitations of conventional Binary Neural Networks (BNNs), especially the persistent quantization error induced by angular bias. RBNN systematically aligns the full-precision weight vectors with their binary counterparts through learned orthogonal rotations, coupled with an adjustable interpolation mechanism and a training-aware sign approximation. This architecture advances the representational capacity and empirical accuracy of 1-bit neural networks, consistently outperforming state-of-the-art BNNs on large-scale classification benchmarks (Lin et al., 2020).

1. Limitations of Standard Binary Neural Networks

Binary Neural Networks binarize the weight vector $w\in\mathbb{R}^n$ by applying an elementwise sign function, producing $b = \operatorname{sign}(w)\in\{-1, +1\}^n$ . To compensate for the loss of norm, a scaling factor $\alpha$ is often learned, minimizing the error $E = \|w - \alpha b\|_2^2$ . However, this approach cannot address the angular discrepancy between $w$ and $b$ . The resulting cosine similarity is $\cos\theta = (w^\top b)/(\|w\|_2\|b\|_2)$ , yielding a lower bound on the quantization error:

$\min_\alpha \|w - \alpha b\|_2^2 \geq \|w\sin\theta\|_2^2$

Thus, a large angular bias $\theta$ imposes an irreducible error. Standard BNNs display both a substantial norm gap and angular misalignment, leading to a marked decline in classification accuracy. Furthermore, the binarization process in these models results in a low “weight flip” rate—empirically 5–10%—which severely under-utilizes the full combinatorial capacity of the weight space, limiting information gain (Lin et al., 2020).

2. Rotation-Based Angular Alignment

RBNN addresses angular bias by seeking an orthogonal matrix $R\in\mathbb{R}^{n\times n}$ ( $R^\top R = I$ ) that rotates $w$ such that $Rw$ is closely aligned with a binary vector $b\in\{-1,+1\}^n$ . The optimization problem is posed as:

$\min_{R, b} \|Rw - b\|_2^2 \quad\textrm{subject to}\quad R^\top R = I,\, b\in\{-1,+1\}^n$

Maximizing the cosine similarity between $Rw$ and $b$ is equivalent to maximizing

$\cos\phi = \eta\,\operatorname{trace}(b w^\top R),\quad \eta=1/(\sqrt{n}\|w\|_2)$

Direct optimization of a dense $R$ is computationally prohibitive ( $\mathcal{O}(n^2)$ time and storage). RBNN circumvents this by factorizing $R$ as a Kronecker product of two smaller orthogonal matrices, $R_1\in\mathbb{R}^{n_1\times n_1}$ and $R_2\in\mathbb{R}^{n_2\times n_2}$ , where $n = n_1 n_2$ :

$R = R_1\otimes R_2$

This “bi-rotation” formulation substantially reduces optimization complexity while retaining expressive rotational flexibility.

The corresponding optimization alternates between:

B-step: Update $B_{W'} = \operatorname{sign}(R_1^\top W R_2)$ .
$R_1$ -step: SVD-based update maximizing $\operatorname{trace}(G_1 R_1)$ with $G_1 = B_{W'} (R_2)^\top W^\top$ .
$R_2$ -step: SVD-based update maximizing $\operatorname{trace}(G_2 R_2)$ with $G_2 = W^\top R_1 B_{W'}$ .

Typically, three cycles of alternating minimization suffice per epoch.

3. Adjustable Rotated Weight Vector

Merely rotating $w$ may not ensure optimal binarization, as $R^\top w$ may over- or under-shoot the desired binary vertex. RBNN interposes an adjustable interpolation between $w$ and $R^\top w$ :

$\tilde{w} = (1-\alpha) w + \alpha (R^\top w)$

Here, $\alpha\in[0,1]$ is a learnable parameter scheduled by back-propagation (specifically, $\alpha = |\sin\beta|$ with $\beta$ trained), starting near zero in early training and increasing to one, which mitigates local minima and enables dynamically controlled rotation exploitation. This mechanism improves training convergence and further reduces quantization error.

4. Training-Aware Sign Function Approximation

The sign function’s derivative is zero almost everywhere, impeding gradient propagation. To address this, RBNN introduces a smooth, epoch-dependent proxy function $F(x)$ for use in the forward pass:

$F(x) = \begin{cases} k \cdot [ -\operatorname{sign}(x)\frac{t^2 x^2}{2} + \sqrt{2} t x ] & |x| < \sqrt{2}/t \ k \cdot \operatorname{sign}(x) & \text{otherwise} \end{cases}$

where the sharpness parameter $t$ evolves from $T_\text{min}=-2$ to $T_\text{max}=1$ over training, $e$ is the current epoch, $E$ is the total number of epochs, and $k = \max(1/t, 1)$ . The gradient for backpropagation is

$F'(x) = \max(k(\sqrt{2}t - |t^2 x|), 0)$

As training progresses, $F(x)\to \operatorname{sign}(x)$ , ensuring binary-valued weights. The chain rule is used to propagate gradients through both $w$ and $\alpha$ : \begin{align*} \partial\mathcal{L}/\partial w &= \partial\mathcal{L}/\partial F(\tilde{w}) \cdot F'(\tilde{w}) \cdot [ (1-\alpha)I + \alpha R^\top ] \ \partial\mathcal{L}/\partial \alpha &= \sum_j [ (\partial\mathcal{L}/\partial F(\tilde{w}))_j \cdot F'(\tilde{w})_j ( (R^\top w - w)_j ) ] \end{align*}

5. Training Procedure and Algorithmic Outline

The RBNN training process alternates between rotation optimization and weight interpolation:

Epoch-level Rotation: For each layer $i$ $i$ at the start of epoch $e$ $e$ :
- Freeze $w^i$ .
- Run three cycles of the alternating update steps for $B$ , $R_1$ and $R_2$ .
- Compose $R^i = R_1^i \otimes R_2^i$ .
Mini-batch Processing: For each mini-batch:
- Compute $\tilde{w}^i = (1-\alpha^i) w^i + \alpha^i (R^i)^\top w^i$ per layer.
- Forward: binarize using $F(\cdot)$ ; compute convolutions via XNOR/bitcount.
- Backward: compute gradients using $F'(\cdot)$ ; update $w^i$ and $\beta^i$ ( $\alpha^i$ ).

Notably, convolutional and fully connected layers (excluding the first and last) are binarized with $1$-bit weights/activations in all reported experiments.

6. Empirical Evaluation

RBNN demonstrates consistent accuracy improvements over contemporary BNN approaches on standard benchmarks. Key results are summarized below.

CIFAR-10 Top-1 Accuracy

Network	Method	W/A	Accuracy
ResNet-18	FP (32/32)	32/32	93.0%
	RAD	1/1	90.5%
	IR-Net	1/1	91.5%
	RBNN	1/1	92.2%
ResNet-20	FP (32/32)	32/32	91.7%
	DoReFa	1/1	79.3%
	DSQ	1/1	84.1%
	IR-Net	1/1	85.4%
	RBNN	1/1	86.5%
	IR-Net* (Bi-Real)	1/1	86.5%
	RBNN* (Bi-Real)	1/1	87.8%
VGG-small	FP (32/32)	32/32	91.7%
	LAB	1/1	87.7%
	XNOR-Net	1/1	89.8%
	BNN	1/1	89.9%
	RAD	1/1	90.0%
	IR-Net	1/1	90.4%
	RBNN	1/1	91.3%

ImageNet Top-1 / Top-5 Accuracy

Network	Method	W/A	Top-1	Top-5
ResNet-18	FP (32/32)	32/32	69.6%	89.2%
	ABC-Net	1/1	42.7%	67.6%
	XNOR-Net	1/1	51.2%	73.2%
	BNN+	1/1	53.0%	72.6%
	DoReFa	1/2	53.4%	–
	Bi-Real	1/1	56.4%	79.5%
	XNOR++	1/1	57.1%	79.9%
	IR-Net	1/1	58.1%	80.0%
	RBNN	1/1	59.9%	81.9%
ResNet-34	FP (32/32)	32/32	73.3%	91.3%
	ABC-Net	1/1	52.4%	76.5%
	Bi-Real	1/1	62.2%	83.9%
	IR-Net	1/1	62.9%	84.1%
	RBNN	1/1	63.1%	84.4%

Compared to XNOR-Net and IR-Net, RBNN improves top-1 ImageNet accuracy by approximately 1.8% and 1.1%, respectively, with analogous gains on CIFAR-10.

RBNN also produces significantly higher weight flip rates—about 50% per layer—maximizing the entropy and fully exploiting the $2^n$ -state binary capacity, as opposed to 5–10% for standard BNNs. The reduction in angular bias and quantization error is evident per layer, and RBNN weight distributions exhibit pronounced bimodality at $\pm1$ .

7. Implementation and Reproducibility

RBNN is implemented in PyTorch. Principal modules include:

rotation.py (bi-rotation optimization),
adjustable_weight.py (computing $\tilde{w}$ and $\alpha$ scheduling),
sign_approx.py (forward $F(x)$ and backward $F'(x)$ ),
training.py (overall training workflow).

Reproducibility is ensured through fixed random seeds and configuration files specifying hyperparameter schedules (including learning rate, $t$ , and $k$ ). Pre-trained full-precision models and all source artifacts are available at https://github.com/lmbxmu/RBNN. The training setups involve SGD with momentum (0.9), weight decay ( $5\times10^{-4}$ ), and standardized schedules for learning rate and batch size (Lin et al., 2020).

Distinctively, RBNN rotates and binarizes all convolutional and fully connected layers except the network’s first and last, following prevailing best practices for 1-bit DNN quantization.

RBNN systematically addresses quantization error in binarized neural networks through angular alignment, maximized weight flips, and an adjustable binarization surrogate. The empirical evaluation confirms notable accuracy advantages and higher information capacity compared to existing 1-bit quantization methods (Lin et al., 2020).

Markdown Report Issue Upgrade to Chat

References (1)

Rotated Binary Neural Network (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Rotated Binary Neural Network (RBNN).