Outperforming 2–3 bit PTQ with sub-1-bit PTQ
Develop sub-1-bit weight-only post-training quantization methods for large language models that achieve higher accuracy than 2-bit and 3-bit post-training quantization baselines while maintaining sub-binary compression rates in the sub-binary regime.
References
Additionally, while NanoQuant outperforms 2-bit baselines, further enhancing capabilities to outperform higher-bit 2 or 3-bit PTQ performance remains an open challenge for the sub-binary regime.
— NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models
(2602.06694 - Chong et al., 6 Feb 2026) in Subsection: Limitations and Future Work