Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Binary and Ternary Quantization Can Improve Feature Discrimination (2504.13792v1)

Published 18 Apr 2025 in cs.LG

Abstract: In machine learning, quantization is widely used to simplify data representation and facilitate algorithm deployment on hardware. Given the fundamental role of classification in machine learning, it is crucial to investigate the impact of quantization on classification. Current research primarily focuses on quantization errors, operating under the premise that higher quantization errors generally result in lower classification performance. However, this premise lacks a solid theoretical foundation and often contradicts empirical findings. For instance, certain extremely low bit-width quantization methods, such as ${0,1}$-binary quantization and ${0, \pm1}$-ternary quantization, can achieve comparable or even superior classification accuracy compared to the original non-quantized data, despite exhibiting high quantization errors. To more accurately evaluate classification performance, we propose to directly investigate the feature discrimination of quantized data, instead of analyzing its quantization error. Interestingly, it is found that both binary and ternary quantization methods can improve, rather than degrade, the feature discrimination of the original data. This remarkable performance is validated through classification experiments across various data types, including images, speech, and texts.

Summary

  • The paper demonstrates that binary and ternary quantization can enhance feature discrimination by shifting focus from quantization error to class separability.
  • Methodologically, it derives precise conditions involving quantization thresholds and data parameters, and validates results through both synthetic and real-world experiments.
  • Practically, the approach offers computational efficiency and robust classification across various datasets and models by leveraging low-bit data representations.

This paper (2504.13792) investigates the impact of binary and ternary quantization on classification performance, proposing a novel evaluation metric based on feature discrimination rather than the traditional quantization error. The conventional wisdom posits that higher quantization error leads to decreased accuracy, a premise that often contradicts empirical observations where aggressive quantization methods like binary ({0,1}\{0, 1\} or {1,1}\{-1, 1\}) and ternary ({0,±1}\{0, \pm 1\}) can yield comparable or even superior results.

The core idea is to directly measure the ability of quantized data to separate different classes. Following Fisher's linear discriminant analysis, feature discrimination is defined as the ratio of expected inter-class squared distance to expected intra-class squared distance. A higher discrimination value suggests that classes are more separable, leading to better classification performance.

The theoretical analysis models data from two classes as vectors where each element (feature dimension) follows a Gaussian distribution. After standardization, the distributions of a single feature element for the two classes are simplified to N(μ,σ2)N(\mu, \sigma^2) and N(μ,σ2)N(-\mu, \sigma^2), with the constraint μ2+σ2=1\mu^2 + \sigma^2 = 1. The paper then derives conditions under which the feature discrimination of binary (DbD_b) and ternary (DtD_t) quantized data is greater than that of the original non-quantized data (DD). These conditions are expressed as inequalities involving the quantization threshold τ\tau, μ\mu, σ\sigma, and the cumulative distribution function of the standard normal distribution.

Theorems prove that both binary and ternary quantization can improve feature discrimination if an appropriate threshold τ\tau exists that satisfies the derived inequalities. The theoretical analysis suggests that this improvement is more likely when the original data are already reasonably separable, corresponding to a sufficiently large μ\mu.

Numerical simulations validate the theoretical findings. By examining the inequalities across different τ\tau values and data distribution parameters μ\mu and σ2\sigma^2, the paper demonstrates the existence of threshold ranges where quantization increases feature discrimination. It is shown that ternary quantization tends to offer a broader range of μ\mu values for which discrimination improvement is possible compared to binary quantization.

Classification experiments are conducted on both synthetic data (generated according to the Gaussian mixture model) and real-world datasets spanning images (YaleB, CIFAR10, ImageNet1000 features), speech (TIMIT features), and text (Newsgroup features). The experiments use various classifiers including k-Nearest Neighbors (KNN) with Euclidean and cosine distances, Support Vector Machines (SVM), Multilayer Perceptrons (MLP), and Decision Trees.

Key experimental results demonstrate:

  • For both synthetic and real data, there are specific ranges of the quantization threshold τ\tau where binary and ternary quantization achieve classification accuracy comparable to or better than using original full-precision data.
  • Ternary quantization generally provides a wider range of effective thresholds and often leads to better performance than binary quantization.
  • The benefits of quantization are observed across different classifiers, although KNN with Euclidean distance appears particularly robust.
  • On synthetic data, classification accuracy closely tracks feature discrimination values across varying thresholds, confirming feature discrimination as a better indicator of classification performance than quantization error.
  • Even though real-world data features do not perfectly follow Gaussian distributions, the approach is robust, partly because the distribution of feature element values across different classes often shows a bimodal tendency (strong/weak presence) which can be approximated by two Gaussian components.
  • The findings generalize to multiclass classification, where a uniform threshold applied per dimension effectively leverages the observed binary nature of feature attributes.

Practical Implementation Considerations:

  • Threshold Selection: Finding the optimal quantization threshold τ\tau is crucial. The paper suggests that the beneficial range of τ\tau depends on the data distribution parameters (μ,σ\mu, \sigma) of individual features. For practical implementation, especially on high-dimensional data, a simple approach used in the experiments is to apply a uniform threshold τ=γη\tau = \gamma \cdot \eta, where η\eta is the average magnitude of feature elements across the dataset, and γ\gamma is a scaling parameter searched over a narrow range (e.g., [0,1][0, 1]). More sophisticated per-dimension or data-dependent thresholding could potentially yield better results. Gradient descent-based optimization methods for finding τ\tau are theoretically outlined in the appendix.
  • Computational Efficiency: Quantization leads to significant computational and memory savings. Representing data or model weights with 1 or 2 bits allows for packed storage and bitwise operations, which are much faster and consume less energy than floating-point arithmetic. This is a primary driver for using low-bit quantization in practice, especially for deployment on resource-constrained hardware. The paper's findings suggest that these benefits can be achieved without sacrificing accuracy, and in some cases, even improving it.
  • Data Characteristics: The theoretical results are strongest when data features per class are approximately Gaussian and separable (large μ\mu). For real data, while the Gaussian assumption is often not strictly met, the observed performance improvement suggests the approach is robust to deviations, particularly when features exhibit a separable, bimodal distribution. Highly sparse or highly overlapping data distributions might pose challenges.
  • Classifier Choice: While the theoretical analysis is rooted in linear discrimination, experiments show that non-linear classifiers like MLP and decision trees can also benefit, as they build upon linear operations. KNN with Euclidean distance performed well in experiments, potentially due to its reliance on instance-based distances which are well-behaved under the proposed quantization.

In conclusion, this research provides a theoretical and empirical basis for understanding how binary and ternary quantization can enhance feature discrimination and, consequently, classification performance. It shifts the focus from minimizing quantization error to maximizing class separability through quantization, offering valuable insights for designing efficient and potentially more accurate machine learning systems, particularly those leveraging low-bit data representations.