RBNN: Rotated Binary Neural Network
- Rotated Binary Neural Networks (RBNN) are quantized frameworks that leverage learned orthogonal rotations and adjustable interpolation to reduce angular bias and quantization error in 1-bit neural networks.
- They overcome standard BNN limitations by aligning full-precision weights with binary counterparts, thereby maximizing weight flip rates and enhancing representational capacity.
- Empirical evaluations on benchmarks such as CIFAR-10 and ImageNet demonstrate that RBNN consistently outperforms existing state-of-the-art methods in classification accuracy.
Rotated Binary Neural Network (RBNN) is a quantized deep learning framework that addresses the limitations of conventional Binary Neural Networks (BNNs), especially the persistent quantization error induced by angular bias. RBNN systematically aligns the full-precision weight vectors with their binary counterparts through learned orthogonal rotations, coupled with an adjustable interpolation mechanism and a training-aware sign approximation. This architecture advances the representational capacity and empirical accuracy of 1-bit neural networks, consistently outperforming state-of-the-art BNNs on large-scale classification benchmarks (Lin et al., 2020).
1. Limitations of Standard Binary Neural Networks
Binary Neural Networks binarize the weight vector by applying an elementwise sign function, producing . To compensate for the loss of norm, a scaling factor is often learned, minimizing the error . However, this approach cannot address the angular discrepancy between and . The resulting cosine similarity is , yielding a lower bound on the quantization error:
Thus, a large angular bias imposes an irreducible error. Standard BNNs display both a substantial norm gap and angular misalignment, leading to a marked decline in classification accuracy. Furthermore, the binarization process in these models results in a low “weight flip” rate—empirically 5–10%—which severely under-utilizes the full combinatorial capacity of the weight space, limiting information gain (Lin et al., 2020).
2. Rotation-Based Angular Alignment
RBNN addresses angular bias by seeking an orthogonal matrix () that rotates such that is closely aligned with a binary vector . The optimization problem is posed as:
Maximizing the cosine similarity between and is equivalent to maximizing
Direct optimization of a dense is computationally prohibitive ( time and storage). RBNN circumvents this by factorizing as a Kronecker product of two smaller orthogonal matrices, and , where :
This “bi-rotation” formulation substantially reduces optimization complexity while retaining expressive rotational flexibility.
The corresponding optimization alternates between:
- B-step: Update .
- -step: SVD-based update maximizing with .
- -step: SVD-based update maximizing with .
Typically, three cycles of alternating minimization suffice per epoch.
3. Adjustable Rotated Weight Vector
Merely rotating may not ensure optimal binarization, as may over- or under-shoot the desired binary vertex. RBNN interposes an adjustable interpolation between and :
Here, is a learnable parameter scheduled by back-propagation (specifically, with trained), starting near zero in early training and increasing to one, which mitigates local minima and enables dynamically controlled rotation exploitation. This mechanism improves training convergence and further reduces quantization error.
4. Training-Aware Sign Function Approximation
The sign function’s derivative is zero almost everywhere, impeding gradient propagation. To address this, RBNN introduces a smooth, epoch-dependent proxy function for use in the forward pass:
where the sharpness parameter evolves from to over training, is the current epoch, is the total number of epochs, and . The gradient for backpropagation is
As training progresses, , ensuring binary-valued weights. The chain rule is used to propagate gradients through both and : \begin{align*} \partial\mathcal{L}/\partial w &= \partial\mathcal{L}/\partial F(\tilde{w}) \cdot F'(\tilde{w}) \cdot [ (1-\alpha)I + \alpha R\top ] \ \partial\mathcal{L}/\partial \alpha &= \sum_j [ (\partial\mathcal{L}/\partial F(\tilde{w}))_j \cdot F'(\tilde{w})_j ( (R\top w - w)_j ) ] \end{align*}
5. Training Procedure and Algorithmic Outline
The RBNN training process alternates between rotation optimization and weight interpolation:
- Epoch-level Rotation: For each layer at the start of epoch :
- Freeze .
- Run three cycles of the alternating update steps for , and .
- Compose .
- Mini-batch Processing: For each mini-batch:
- Compute per layer.
- Forward: binarize using ; compute convolutions via XNOR/bitcount.
- Backward: compute gradients using ; update and ().
Notably, convolutional and fully connected layers (excluding the first and last) are binarized with $1$-bit weights/activations in all reported experiments.
6. Empirical Evaluation
RBNN demonstrates consistent accuracy improvements over contemporary BNN approaches on standard benchmarks. Key results are summarized below.
CIFAR-10 Top-1 Accuracy
| Network | Method | W/A | Accuracy |
|---|---|---|---|
| ResNet-18 | FP (32/32) | 32/32 | 93.0% |
| RAD | 1/1 | 90.5% | |
| IR-Net | 1/1 | 91.5% | |
| RBNN | 1/1 | 92.2% | |
| ResNet-20 | FP (32/32) | 32/32 | 91.7% |
| DoReFa | 1/1 | 79.3% | |
| DSQ | 1/1 | 84.1% | |
| IR-Net | 1/1 | 85.4% | |
| RBNN | 1/1 | 86.5% | |
| IR-Net* (Bi-Real) | 1/1 | 86.5% | |
| RBNN* (Bi-Real) | 1/1 | 87.8% | |
| VGG-small | FP (32/32) | 32/32 | 91.7% |
| LAB | 1/1 | 87.7% | |
| XNOR-Net | 1/1 | 89.8% | |
| BNN | 1/1 | 89.9% | |
| RAD | 1/1 | 90.0% | |
| IR-Net | 1/1 | 90.4% | |
| RBNN | 1/1 | 91.3% |
ImageNet Top-1 / Top-5 Accuracy
| Network | Method | W/A | Top-1 | Top-5 |
|---|---|---|---|---|
| ResNet-18 | FP (32/32) | 32/32 | 69.6% | 89.2% |
| ABC-Net | 1/1 | 42.7% | 67.6% | |
| XNOR-Net | 1/1 | 51.2% | 73.2% | |
| BNN+ | 1/1 | 53.0% | 72.6% | |
| DoReFa | 1/2 | 53.4% | – | |
| Bi-Real | 1/1 | 56.4% | 79.5% | |
| XNOR++ | 1/1 | 57.1% | 79.9% | |
| IR-Net | 1/1 | 58.1% | 80.0% | |
| RBNN | 1/1 | 59.9% | 81.9% | |
| ResNet-34 | FP (32/32) | 32/32 | 73.3% | 91.3% |
| ABC-Net | 1/1 | 52.4% | 76.5% | |
| Bi-Real | 1/1 | 62.2% | 83.9% | |
| IR-Net | 1/1 | 62.9% | 84.1% | |
| RBNN | 1/1 | 63.1% | 84.4% |
Compared to XNOR-Net and IR-Net, RBNN improves top-1 ImageNet accuracy by approximately 1.8% and 1.1%, respectively, with analogous gains on CIFAR-10.
RBNN also produces significantly higher weight flip rates—about 50% per layer—maximizing the entropy and fully exploiting the -state binary capacity, as opposed to 5–10% for standard BNNs. The reduction in angular bias and quantization error is evident per layer, and RBNN weight distributions exhibit pronounced bimodality at .
7. Implementation and Reproducibility
RBNN is implemented in PyTorch. Principal modules include:
rotation.py(bi-rotation optimization),adjustable_weight.py(computing and scheduling),sign_approx.py(forward and backward ),training.py(overall training workflow).
Reproducibility is ensured through fixed random seeds and configuration files specifying hyperparameter schedules (including learning rate, , and ). Pre-trained full-precision models and all source artifacts are available at https://github.com/lmbxmu/RBNN. The training setups involve SGD with momentum (0.9), weight decay (), and standardized schedules for learning rate and batch size (Lin et al., 2020).
Distinctively, RBNN rotates and binarizes all convolutional and fully connected layers except the network’s first and last, following prevailing best practices for 1-bit DNN quantization.
RBNN systematically addresses quantization error in binarized neural networks through angular alignment, maximized weight flips, and an adjustable binarization surrogate. The empirical evaluation confirms notable accuracy advantages and higher information capacity compared to existing 1-bit quantization methods (Lin et al., 2020).