Outperforming 2–3 bit PTQ with sub-1-bit PTQ

Develop sub-1-bit weight-only post-training quantization methods for large language models that achieve higher accuracy than 2-bit and 3-bit post-training quantization baselines while maintaining sub-binary compression rates in the sub-binary regime.

Background

The paper introduces NanoQuant, a post-training quantization (PTQ) approach that achieves binary and sub-1-bit compression of LLMs using low-rank binary factorization with ADMM-based initialization and hierarchical reconstruction. While NanoQuant demonstrates competitive or superior results to several binary PTQ baselines and even approaches QAT performance with far less data and compute, the authors note a remaining performance gap relative to higher-bit PTQ settings.

In particular, although NanoQuant sometimes surpasses 2-bit baselines, the broader goal of consistently exceeding the performance of 2-bit and 3-bit PTQ methods when operating in the sub-binary (≤1-bit) regime is explicitly identified as an unresolved challenge. Closing this gap would establish sub-1-bit PTQ as a strictly preferable alternative to higher-bit PTQ in accuracy as well as memory efficiency.

References

Additionally, while NanoQuant outperforms 2-bit baselines, further enhancing capabilities to outperform higher-bit 2 or 3-bit PTQ performance remains an open challenge for the sub-binary regime.

NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models  (2602.06694 - Chong et al., 6 Feb 2026) in Subsection: Limitations and Future Work