Quantification Learning: Prevalence Estimation

Updated 6 January 2026

Quantification Learning is a supervised task that estimates class-prevalence vectors from unlabeled data under distribution shifts.
It employs techniques ranging from simple classify-and-count methods to advanced deep sequential models for robust quantification.
Rigorous evaluation using protocols like APP with AE and RAE metrics ensures reliable benchmarking and optimization of quantifiers.

Quantification Learning (QL) is the supervised machine learning task of estimating the class-prevalence vector—that is, the relative frequencies of the classes of interest—in unlabelled data samples. Unlike standard classification, which aims to predict individual class labels, QL explicitly targets the aggregate label distribution of a sample. This distinction is especially relevant when there is prior probability shift or when the ultimate application requires only prevalence rather than instance-level predictions. QL has prominent applications in sentiment analysis, epidemiology, market research, and information retrieval contexts.

1. Formal Problem Setup and Task Definition

Let $L = \{(x_1, y_1), \ldots, (x_{|L|}, y_{|L|})\}$ be a labelled training set, where $y_i \in Y = \{y_1, \ldots, y_n\}$ ( $n \geq 2$ ), and let $o \subset U$ be an unlabelled subset ("sample") drawn from the same domain $X$ . The goal of Quantification Learning (QL) is to train a quantifier $Q(\cdot)$ that, for any $o$ , produces an estimate $\hat{p}_o$ over $Y$ of the true class-prevalence vector $p_o$ , where

$p_o(y) = \frac{|\{x \in o : y(x) = y\}|}{|o|}, \qquad \hat{p}_o(y) = Q(o)_y$

The QL task covers both the binary setting ( $n=2$ , e.g. sentiment $\in$ {Negative, Positive}) and single-label multiclass ( $n > 2$ , e.g. topic categorization), with each instance assigned exactly one class.

2. Data Generation and Experimental Protocols

LeQua@CLEF2022 (“Learning to Quantify”) established a rigorous evaluation protocol for QL (Esuli et al., 2021). Its datasets derive from Amazon product reviews, filtered by minimum text length and nonzero user voting, and are provided in two forms: pre-computed feature vectors (“vector task”) and raw documents (“raw-document task”).

For benchmarking, the lab adopted the Artificial Prevalence Protocol (APP), utilizing Kraemer’s algorithm to uniformly sample prevalence vectors from the unit $(n-1)$ -simplex. For each sampled prevalence $(p_1, \ldots, p_n)$ , the protocol assembles test samples by randomly drawing $\lfloor p_j \cdot |U| \rfloor$ instances of each class $j$ , thereby generating a large and diverse suite of samples exhibiting wide-ranging true prevalences. APP is essential for quantifier validation under distribution shift.

Data splits include stratified training sets, development samples, and test samples, the latter constructed via APP to ensure prevalence diversity.

3. Quantification Algorithms: Canonical Approaches

LeQua2022, using the QuaPy framework, evaluates several canonical methods for quantification (Esuli et al., 2021):

Classify & Count (CC): Train a classifier $h$ on $L$ , predict labels for $x_i \in o$ , and output

$\hat{p}_{CC}(y) = \frac{1}{|o|} \sum_{x \in o} \mathbf{1}_{\{h(x) = y\}}$

This approach is generally suboptimal under prevalence shift.

Adjusted Classify & Count (ACC): Applies a correction to CC using true-positive ( $\mathrm{TPR}$ ) and false-positive ( $\mathrm{FPR}$ ) rates:

$\hat{p}_{ACC}(y^+) = \frac{\hat{p}_{CC}(y^+) - \widehat{\mathrm{FPR}}}{\widehat{\mathrm{TPR}} - \widehat{\mathrm{FPR}}}$

$\mathrm{TPR}$ and $\mathrm{FPR}$ are estimated on $L$ .

Probabilistic Classify & Count (PCC): Aggregates posterior probabilities instead of hard decisions:

$\hat{p}_{PCC}(y) = \frac{1}{|o|} \sum_{x \in o} P(y|x)$

Probabilistic Adjusted CC (PACC): Analogous to ACC, substituting probabilities for counts.
SVM-based Quantifiers: Structured prediction algorithms, such as SVM(Q) and SVM(KLD), optimize surrogate losses that directly measure quantification error—absolute error on prevalences or KL divergence, respectively.
Deep Learning Quantifiers: RNN-based architectures (e.g., QuaNet (Esuli et al., 2018)) process classifier score sequences and document embeddings to generate quantification embeddings, typically outperforming classical aggregative approaches under complex distribution shifts.

The following table encapsulates method types:

Method	Mechanism	Correction/Optimization
CC	Hard labels	None
ACC	Hard labels	Confusion matrix inversion
PCC	Probabilities	None
PACC	Probabilities	Confusion matrix inversion
SVM(Q,KLD)	Structured optimization	Direct surrogate loss minimization
QuaNet	Deep sequential modeling	End-to-end quantification

4. Evaluation Metrics and Protocols

LeQua2022 employs two primary error measures in quantification (Esuli et al., 2021):

Absolute Error (AE):

$\mathrm{AE}(p, \hat{p}) = \frac{1}{n} \sum_{y \in Y} |p(y) - \hat{p}(y)|$

Relative Absolute Error (RAE):

$\mathrm{RAE}(p, \hat{p}) = \frac{1}{n} \sum_{y \in Y} \frac{|p(y) - \hat{p}(y)|}{p(y)}$

For classes with $p(y) = 0$ , additive- $\epsilon$ smoothing is applied:

$p_\epsilon(y) = \frac{\epsilon + p(y)}{\epsilon n + \sum_y p(y)}, \quad \epsilon = \frac{1}{2|o|}$

Ranking and significance of quantifiers is determined via average RAE across test samples and paired-comparison Wilcoxon signed-rank tests.

5. Theoretical Frameworks for Quantification Under Shift

Recent advances formalize QL as a constrained estimation problem optimized directly for the loss of interest (Firat, 2016, Dussap et al., 2023):

Unified Regression Framework (Firat, 2016): All major QL methods can be cast as constrained multi-variate regression. Given training-conditional feature expectations and test-set sample averages, the estimated prevalence vector $\hat{\boldsymbol\pi}$ solves

$\boldsymbol{y} = X \hat{\boldsymbol\pi} + \boldsymbol{\varepsilon},\quad \sum_k \hat{\pi}_k = 1,\; \hat{\pi}_k \geq 0$

The loss to minimize—MSE, KL, $L_1$ , Hellinger—determines the specific statistical program (quadratic, linear, or nonlinear).

Distribution Feature Matching (DFM) (Dussap et al., 2023): Let $\Phi: \mathcal{X} \to \mathcal{F}$ be a feature map (e.g., the canonical RKHS mapping for a kernel). DFM solves

$\hat{\alpha} = \arg\min_{\alpha \in \Delta^c} \left\| \sum_{i=1}^c \alpha_i \Phi(\hat{P}_i) - \Phi(\hat{Q}) \right\|_{\mathcal{F}}^2$

with robust, finite-sample bounds involving the Gram matrix curvature $\lambda_{\min}(G^c)$ and explicit formulas for contamination robustness.

This formalization unifies ACC, BBSE, KMM, and MLLS under generalized feature matching.

6. Comparative Analysis and Key Findings

Empirical studies (e.g., QuaPy, LeQua2022, SVM-based quantifiers (Esuli et al., 2015, Moreo et al., 2021)) reveal that:

CC exhibits substantial degradation under distribution shift; it violates Vapnik's principle by solving classification as an unnecessary intermediate step.
ACC/PACC outperform CC, particularly in high-shift regimes, by correcting systemic classifier bias.
SVM(Q), SVM(KLD), RNN approaches deliver further improvements—especially where class prevalence drifts are severe—by direct loss minimization or sequence modeling.
Evaluation must use APP (uniform simplex sampling) to properly assess quantifier robustness across the range of possible prevalences; reporting AE and RAE provides rigorous, comparable metrics.

7. Contributions, Benchmarking, and Future Directions

LeQua2022 established the first CLEF shared task focused exclusively on quantification (Esuli et al., 2021):

Benchmarking: QuaPy (open-source) provides standard data generators, quantification metrics, and method implementations.
Protocol Standardization: Adoption of Kraemer's prevalence sampling and statistical significance frameworks.
Research Trajectory: The initiative paves the way for cross-lingual quantification, adaptation to new data modalities (streams, networks), and ongoing refinement of quantification-oriented loss functions and algorithms.

Further directions include multi-label and streaming quantification, domain adaptation, and deep learning methods robust to non-IID shifts.

In summary, Quantification Learning is now formalized as a supervised estimation problem in which the primary objective is accurate and robust prediction of class-prevalence vectors under realistic (often non-IID) sampling and prior shift. The research landscape encompasses regression frameworks, feature matching, confusion-matrix corrections, and direct multivariate optimization, supported by precise evaluation standards and public benchmarks, with proven benefits observable in both synthetic simulation and real-world textual data (Esuli et al., 2021, Firat, 2016, Dussap et al., 2023).