Fair-GPTQ: Fairness-Aware Quantization for LLMs
- Fair-GPTQ is a method that applies group-fairness constraints during quantization to mitigate bias in LLM weights while ensuring efficient deployment.
- It formulates quantization as a constrained quadratic program with an added bias penalty, directly targeting stereotype-induced representation shifts.
- Empirical evaluations demonstrate that Fair-GPTQ reduces bias metrics with minimal accuracy loss and enables 4-bit quantization for significant memory and speed gains.
Fairness-aware constraints in the context of model quantization—embodied by the Fair-GPTQ method—represent a principled approach for mitigating group-level bias during the deployment of LLMs under stringent compute requirements. By directly incorporating group-fairness considerations into the quantization process, these constraints enable efficient model deployment while maintaining or improving fairness with respect to stereotypical, occupational, and discriminatory content across protected groups such as gender, race, and religion (Proskurina et al., 18 Sep 2025).
1. Mathematical Formulation of Fair-GPTQ
The foundational objective of GPTQ quantization is to minimize the reconstruction error of the input-weight product on calibration data for each model layer. Specifically, for a full-precision weight matrix and hidden-state sequences , GPTQ solves:
The quantization is formulated as a constrained quadratic program over the flattened weights, employing a second-order Taylor expansion and leveraging block-diagonal Hessians for tractable approximation.
Fair-GPTQ extends this objective with a fairness-aware bias penalty. For paired inputs that differ only in a protected-attribute token (e.g., "he" vs. "she"), and with , the modified quantization objective per layer is:
where regulates the strength of the bias penalty. The penalty targets the representation shift along dimensions defined by , discouraging discriminatory behavior encoded in the quantized weights.
The solution relies on the block-diagonality of the Hessian, permitting efficient closed-form updates and quantization with tractable overhead. This group-fairness constraint is data-driven, using stereotype/anti-stereotype calibration pairs from evaluation benchmarks such as StereoSet (Proskurina et al., 18 Sep 2025).
2. Fairness Metrics and Evaluation Paradigm
Fair-GPTQ does not directly incorporate classical group-fairness metrics—such as statistical parity difference or equality of opportunity—into the quantization loss. Instead, the fairness objective operationalizes paired input perturbations within the quantization target. After quantization, fairness is empirically measured using established LLM bias benchmarks:
- CrowS-Pairs: Proportion of instances in which the model prefers a stereotyped over an anti-stereotyped sentence.
- StereoSet: Fractional bias score computed as the percentage of contexts where .
- Co-occurrence (CooC): Average log-probability ratio of protected-attribute words in semantically neutral contexts (e.g., log odds for "nurse" being "female" vs. "male").
- BBQ: Two bias scores (ambiguous/disambiguated) as defined in prior work.
- SoFA: Variance in log-perplexity across a category-specific set of probe sentences, quantifying linguistic consistency over protected axes.
The bias penalty in training enforces representation similarity for protected-attribute-alternate inputs, while all fairness metrics above are reserved for post-hoc evaluation (Proskurina et al., 18 Sep 2025).
3. Optimization Algorithm and Quantization Workflow
The Fair-GPTQ optimization adapts the OBS (Optimal Brain Surgeon) framework to the bias-regularized objective. The update for each weight 0 employs the following expression:
1
where 2, 3, and 4.
The operational pipeline includes:
- Pre-processing each layer with a debiasing matrix update:
5
where 6.
- Column-by-column quantization in blocks (block size 7), using symmetric 4-bit quantization.
- Cholesky-based error correction within each block.
- Iteration across all quantized matrices in selected layers, with 8 typically set to 9 (full-model debiasing) or 0 (partial-layer).
This approach retains GPTQ's computational efficiency, incurring only a modest additional quantization time (≈20%) relative to baseline (Proskurina et al., 18 Sep 2025).
4. Experimental Design and Benchmarks
Fair-GPTQ is evaluated across the OPT model family and Mistral-7B, focusing on quantization of attention output-projection and feedforward layers at 4-bit precision. Baselines include full-precision (FP16), GPTQ-C4 (general calibration), and GPTQ-SS (StereoSet-paired calibration).
Calibration data consists of 4,212 stereotype/anti-stereotype pairs from the StereoSet development set. Zero-shot performance is assessed through ARC-Easy, HellaSwag, Cloze-English, and PIQA. Fairness is evaluated on CrowS-Pairs (9 axes), StereoSet (4 axes), Co-occurrence, BBQ (11 axes), and SoFA (1.49 million probes over 4 axes). Comparative baselines include post-hoc debiasing (Iterative Nullspace Projection, Self-Debias, SentenceDebias) applied to FP16 (Proskurina et al., 18 Sep 2025).
5. Empirical Results
The table below summarizes performance for Mistral-7B and OPT-6.7B under 4-bit quantization (reported as in (Proskurina et al., 18 Sep 2025)):
| Model/Method | PPL | ARC-E | Cloze | HellaSwag | PiQA | Cooc | CP | SS |
|---|---|---|---|---|---|---|---|---|
| Mistral-7B FP16 | 5.50 | 79.55 | 78.29 | 60.93 | 80.25 | 84.33 | 65.95 | 64.01 |
| GPTQ-SS | 5.74 | 78.75 | - | - | - | - | 65.77 | 63.79 |
| Fair-GPTQ_ALL | 6.76 | 73.95 | - | - | - | - | 63.92 | 62.60 |
| OPT-6.7B FP16 | 10.24 | 65.57 | - | - | - | - | 67.98 | 66.97 |
| GPTQ-SS | 10.83 | - | - | - | - | - | 67.74 | 66.88 |
| Fair-GPTQ_ALL | 13.21 | - | - | - | - | - | 67.26 | 66.92 |
Key findings:
- Fair-GPTQ reduces bias metrics (CP, SS) relative to FP16 and GPTQ, while retaining ≥90% of original zero-shot accuracy.
- 4-bit quantization achieves ~4× memory reduction and ~2× speedup compared to FP16.
- Quantization time increases modestly (e.g., OPT-6.7B: 7.98 min → 10.40 min for Fair-GPTQ).
- On selected axes, Fair-GPTQ achieves performance on par with INLP, outperforming Self-Debias and SentenceDebias on certain benchmarks.
- Layer ablations establish that focusing debiasing on lower layers or specific matrices (attention output-projection, FC2) achieves maximal bias reduction with minimal accuracy loss (Proskurina et al., 18 Sep 2025).
6. Analysis of Fairness Contributions and Ablations
Layer ablation demonstrates that restricting fairness-aware quantization to the lower ~10% of model layers provides most of the achievable bias reduction with minimal increase in perplexity. Matrix-type ablation identifies attention output-projection and FC2 as the principal contributors to group bias; debiasing exclusively these matrices captures most of the fairness benefit of full-model Fair-GPTQ.
Qualitative investigation via bias heatmaps (e.g., on the BBQ benchmark) confirms substantial reductions in stereotype prevalence. For instance, nationality bias in ambiguous QA contexts is reduced from 5.32% to 0.52% (Proskurina et al., 18 Sep 2025).
7. Limitations and Prospective Directions
The primary limitations of Fair-GPTQ are the scope of calibration data (mono-lingual, short context), evaluation limited to OPT and Mistral architectures, and absence of direct extension to multilingual or multimodal models. Longer-context calibration and evaluation on more diverse and larger-model architectures (e.g., LLaMA-3, Qwen) remain open research directions (Proskurina et al., 18 Sep 2025).
A plausible implication is that integrating group-fairness constraints at quantization time offers a scalable avenue for improving the social desirability of LLM outputs without resorting to expensive retraining or inference-time debiasing. Further generalization and integration of fairness constraints into other model compression and deployment strategies remains an active area for exploration.