4/6 Adaptive Scaling in Neural Networks
- The paper demonstrates that 4/6 Adaptive Scaling minimizes quantization error and improves bitrate metrics through dynamic selection between candidate scale factors in CNNs and NVFP4.
- The method leverages dual candidate scales (Δ(6) and Δ(4)) with bilinear interpolation for CNN downsampling and adaptive quantization, ensuring lower MSE and hardware efficiency.
- Empirical results highlight improved PSNR, SSIM, and normalized accuracy in vision and language models, confirming the technique’s impact on training stability and inference precision.
Four Over Six (4/6) Adaptive Scaling refers to distinct advancements in two domains: fractional spatial scaling in convolutional neural networks (CNNs) for image/video processing, and adaptive quantization in low-precision matrix formats (notably NVFP4) for large-scale neural network training and inference. In both contexts, the "4/6" ratio is leveraged to dynamically select between candidate scaling strategies, optimizing for reduced quantization error or improved representational fidelity. The technique’s adoption addresses core challenges such as nonuniform quantization artifacts in FP4-based computation on NVIDIA Blackwell GPUs (Cook et al., 1 Dec 2025) and suboptimal fractional downsampling in CNNs for video bitrate adaptation (Chen et al., 2021).
1. Motivation and Context
Fractional scaling operations and block-adaptive quantization both arise from limitations in classical approaches:
- In CNN-based image/video processing, standard convolutions and pooling operate with integer strides, allowing only fixed-integer spatial scaling. However, typical industry requirements, such as 1080p→720p (scaling factor 1.5), dictate the need for precise, learnable fractional downsampling, as found in bitrate-adaptive streaming.
- For neural network quantization, particularly in emerging 4-bit floating-point (FP4) formats such as NVFP4 on Blackwell hardware, fixed blockwise scaling centered on the largest representable FP4 value (±6) induces large quantization errors for values near the maximal range, undermining training stability and post-training quantization (PTQ) accuracy (Cook et al., 1 Dec 2025). Classical integer quantization lacks uniformly distributed representable points, increasing quantization errors for "near-edge" values.
Both populations—vision models and LLMs—thus require an adaptable scaling mechanism providing improved accuracy and robustness via dynamic scaling factor selection.
2. Mathematical Formulation
2.1. Fractional Downsampling in CNNs
Let , so is the upscaling factor. For input , the conv-resize (4/6) block implements:
- Convolution (stride 1):
- Differentiable resizer: Bilinear interpolation with scale factor :
Backpropagation passes through the bilinear weights.
2.2. NVFP4 Quantization with 4/6
For block :
- Two candidate scales:
- Cast each to FP8, then quantize/dequantize under both:
- Select scale minimizing
This per-block selection ensures that the representational gap between 66.6% and 100% of the original block maximum (where FP4 has no representable value) does not systematically degrade accuracy (Cook et al., 1 Dec 2025).
3. Implementation Details
3.1. Fractional Scaling in Networks
To replicate 4/6 adaptive scaling in vision models, replace integer-strided layers with a ConvResize4over6 module, which first applies convolution at stride 1, followed by out-of-place bilinear resizing with scale factor 1.5. PyTorch and other frameworks propagate gradients natively through both stages. For precise output shape , explicit sizing may replace the scale factor argument to avoid boundary effects (Chen et al., 2021).
3.2. NVFP4 4/6 Adaptive Scaling on Blackwell
NVFP4 packs each 16-element block as an FP8 scale factor and 16 FP4 values. The 4/6 algorithm computes dequantized FP16 blocks and MSEs for both and entirely in registers using high-throughput CUDA/PTX kernels. Given Blackwell’s hardware support for E2M1 (FP4) and E4M3 (FP8) formats, overhead is minimal: for inference and for large-batch training (Cook et al., 1 Dec 2025). All major matrix-multiply operands—weights, activations, and gradients—are compatible with the revised quantization scheme.
4. Empirical Results
4.1. Image/Video Downsampling
For adaptive bitrate video streaming (1080p→720p, ), conv-resize 4/6 blocks applied to deep downsampling architectures yield improved BD-rate metrics over commonly used Lanczos and bicubic methods:
| Upsampling | PSNR BD-rate gain | SSIM BD-rate gain | VMAF BD-rate gain |
|---|---|---|---|
| H.264 + bilinear | –4.06% | –2.47% | –1.20% |
| Bicubic | –2.22% | –1.19% | –0.77% |
Negative BD-rate indicates required bitrate reduction for constant quality (Chen et al., 2021).
4.2. NVFP4 Quantization
- Training Stability: Conventional end-to-end NVFP4 training with stochastic rounding, Hadamard transforms, block scales, and BF16 healing diverges on 340M/1.3B param Transformers and hybrids. 4/6 prevents this, with training-loss tracking BF16 (Cook et al., 1 Dec 2025).
- PTQ Performance: For Llama-3-8B, Qwen3-8B (W4A4, RTN PTQ), 4/6 reduces WikiText-2 perplexity by 10–20% (e.g., 8.438.30), and with AWQ or SmoothQuant, narrows the PPL gap to BF16 by 5%.
- Downstream Accuracy: Across BoolQ, ARC, and HellaSwag, applying 4/6 increases normalized accuracy by 1–2 points for multiple quantization regimes.
5. Practical Considerations and Usage
- Hardware Synergy: The 4/6 algorithm aligns with Blackwell GPU instruction sets supporting efficient conversion between FP4 and FP8, enabling seamless integration with minimal runtime impact (Cook et al., 1 Dec 2025).
- In Vision Models: For fractional scaling, aligning H,W to multiples of 6 avoids rounding artifacts. Always specify
align_corners=Falsefor stable gradients during bilinear rescaling. - Code Integration: Minimal PyTorch, CUDA, or equivalent APIs suffice to integrate these blocks into state-of-the-art architectures, without modifying downstream layers or requiring special regularization. The per-block quantization process is compatible with post-training quantization approaches such as AWQ and SmoothQuant.
6. Significance and Broader Impact
The Four Over Six (4/6) adaptive scaling strategy constitutes an efficient, hardware-friendly, and empirically validated method for addressing quantization and downsampling bottlenecks:
- For FP4/NVFP4 quantization, it mitigates the disproportionate error on near-maximal values, which have been empirically identified as primary causes for both divergence during training and degraded inference accuracy in large-scale neural networks.
- For vision networks, learned fractional scaling blocks offer a path to improved coding efficiency and quality preservation in applications requiring non-integer spatial transformations.
A plausible implication is that as low-precision training proliferates across domains, blockwise-adaptive schemes like 4/6 will gain increasing importance, especially in future hardware-accelerated environments and large-scale deployment settings (Cook et al., 1 Dec 2025, Chen et al., 2021).