Pool Adjacent Violators Algorithm (PAVA)

Updated 28 May 2026

Pool Adjacent Violators Algorithm (PAVA) is a method for efficient isotonic regression that iteratively merges blocks violating a prescribed monotonic order.
It applies to diverse domains such as classifier calibration, shape-constrained regression, and structured cone projections, optimizing various convex and quasiconvex objectives.
PAVA guarantees global optimality with linear time complexity, adaptable to weighted, antitonic, and multidimensional order constraints.

The Pool Adjacent Violators Algorithm (PAVA) is a fundamental method in nonparametric statistics and optimization for imposing monotonicity constraints in regression, calibration, and projection problems. It computes isotonic (or antitonic) fits efficiently by iteratively merging contiguous blocks that violate the prescribed order, yielding piecewise constant solutions optimal under various convex and quasiconvex objectives. PAVA is central both as a direct estimator in isotonic regression and as a computational primitive within broader frameworks such as calibration of classifiers, projection onto structured cones, and shape-constrained regularization.

1. Formal Statement and Algorithmic Structure

Consider data $(y_1, \ldots, y_n)$ (with optional weights $w_1, \ldots, w_n > 0$ ), and the goal of finding fitted values $(\hat y_1, \ldots, \hat y_n)$ that satisfy a monotonicity constraint—e.g., $\hat y_1 \leq \hat y_2 \leq \cdots \leq \hat y_n$ —and minimize a separable loss such as weighted least squares: $\min_{\hat y_1 \leq \cdots \leq \hat y_n} \sum_{i=1}^n w_i (y_i - \hat y_i)^2$ The classical PAVA operates by initializing each index as its own block, then repeatedly merging adjacent blocks whose block means violate the order constraint, assigning the pooled block its mean (for least squares) or a more general blockwise statistic (for generic objectives). Merges continue until no violations remain. The final $\hat y_i$ are constant on each block.

Pseudocode for the basic least-squares PAVA:

$w_1, \ldots, w_n > 0$ 9 Final fitted values are assigned by setting all $i$ in a block to its block mean.

2. Theoretical Foundations and Optimality

PAVA is not restricted to least-squares loss. For monotonic calibration of binary probabilistic classifiers, the optimization involves aggregate regular binary proper scoring rules (RBPSR), such as log-loss or Brier score, with monotonicity constraints on the calibrated probabilities $p_1 \leq \cdots \leq p_T$ . The key property is that, for any contiguous block of examples, the sum of costs is quasiconvex (but not necessarily convex) in the block value and is minimized at a block-average expression (for example, $r = m/(m+n)$ for counts $m, n$ ).

The algorithm greedily pools violating blocks, always replacing their values by the unique local (and in fact global) minimizer, ensuring that the monotonicity constraint and blockwise optimality lead to global optimality for the full sequence. Convexity of the scoring rule is not needed; only blockwise quasiconvexity and properness are required (Brummer et al., 2013).

A generalization shows that PAVA produces solutions that are simultaneously optimal for entire classes of loss functions associated with certain identification functionals, including means, quantiles, and expectiles. This simultaneous optimality is unattainable for unimodal constraints or settings where the admissible superlevel sets lack lattice structure (Jordan et al., 2019).

3. Applications Across Domains

PAVA’s computational primitive is foundational across diverse application areas:

Probabilistic Calibration and Reliability Diagrams: PAVA is used for calibrating forecast probabilities, producing monotonic mappings that minimize miscalibration under proper scoring rules. The CORP reliability diagram framework leverages PAVA for consistent, optimally binned, and reproducible diagrams, supporting both visualization and rigorous score decompositions (Dimitriadis et al., 2020).
Binary Classifier Calibration: For binary classifier outputs, PAVA directly optimizes the monotonic calibration objective for all regular binary proper scoring rules, encompassing both probability and log-likelihood-ratio calibration (Brummer et al., 2013).
Shape-Constrained Regression and Sparse Regularization: In ordered Lasso and sparse time-lagged regression, the PAVA computes the proximal operator under monotonicity plus sparsity constraints. The core operation is subtraction of regularization, a PAVA non-increasing fit, and nonnegativity thresholding (Suo et al., 2014).
Projection onto Structured Cones: PAVA is used as a subroutine for the Euclidean projection onto monotone cones, monotone nonnegative cones, and monotone extended second-order cones. For MESOC, the projection reduces to an augmented isotonic regression in one higher dimension, solved efficiently by PAVA (Ferreira et al., 2021, Németh et al., 2012).
Sorted Penalty Proximal Operators: For a family of sorted nonconvex penalties (e.g., SLOPE, MCP, SCAD, ℓq), the proximal mapping is solved via PAVA in the weakly convex case; minor adaptations suffice for local minimization in the nonconvex regime (Gagneux et al., 18 Jun 2025).
Piecewise Monotone Estimation: Nearly isotonic regression and generalized order constraints (e.g., network learning with product orders) employ modified or iterative PAVA as the main algorithmic engine (Matsuda et al., 2021, Feelders, 2012).

4. Algorithmic Variants and Computational Complexity

The basic version of PAVA runs in $w_1, \ldots, w_n > 0$ 0 time and $w_1, \ldots, w_n > 0$ 1 space. Each data point participates in at most one forward and one backward merge, and maintenance of blocks can be implemented with a stack or linked list. Variants exist to handle:

Antitonic Constraints: Nonincreasing fits for problems requiring monotonicity in the opposite direction are handled via analogous block merging rules.
Weighted Projections: Variable weights are incorporated naturally into block statistics.
Partial Orders and Product Orders: For multidimensional lattice constraints, iterative application of 1D PAVA sweeps yields globally optimal isotonic regression (Feelders, 2012).
Accelerated Updates: For settings where a sequence of projections with only minor data perturbations is needed (e.g., distributional regression across multiple slices), abridged PAVA updates only the affected block and nearby merges, yielding empirical speedups by orders of magnitude (Henzi et al., 2020).
Piecewise and Nearly Isotonic Settings: Modified PAVA variants compute penalized estimators that allow a small number of order violations, as in change-point estimation and model selection (Matsuda et al., 2021).

5. Extensions: Scoring Rules, Partial Monotonicity, and Cone Projections

PAVA’s scope extends far beyond least-squares isotonic regression:

Regular Binary Proper Scoring Rules: For arbitrary RBPSR, block merging is governed by poolings that minimize blockwise aggregate cost, and PAVA is shown to optimize all such rules, unifying probabilistic calibration methods (Brummer et al., 2013).
Partial and Product Orders: In Bayesian network parameter learning and related settings, PAVA is iteratively applied across each dimension of a product of linear orders, converging to the unique global minimizer for the induced partial order (Feelders, 2012).
Monotone Cones and MESOC: Projections onto advanced cones such as MESOC reduce directly to one additional coordinate of isotonic regression, and optimal projections for monotone nonnegative cones are given by a PAVA fit followed by thresholding (Ferreira et al., 2021, Németh et al., 2012).

6. Illustrative Examples and Practical Implementation

PAVA’s operation is transparent in concrete examples. For input $w_1, \ldots, w_n > 0$ 2, the algorithm sequentially pools blocks violating monotonicity:

Merge $w_1, \ldots, w_n > 0$ 3
$w_1, \ldots, w_n > 0$ 4
$w_1, \ldots, w_n > 0$ 5
Final fit: $w_1, \ldots, w_n > 0$ 6

For classifier calibration, labels $w_1, \ldots, w_n > 0$ 7 processed via PAVA yield blockwise fractions, with sequential merges producing the unique optimal monotonic fit for all proper scoring rules.

Efficient stack-based implementation guarantees $w_1, \ldots, w_n > 0$ 8 time by only scanning and updating violated block pairs, with each block merge strictly reducing the number of active blocks (Brummer et al., 2013, Feelders, 2012). Accelerated or abridged versions strategically update only changed blocks for repeated fits (Henzi et al., 2020).

7. Limitations, Generalization, and Simultaneous Optimality

Simultaneous optimality across classes of loss functions is a feature of isotonic regression and PAVA in total (and some partial) orders, but fails for more complex constraints such as unimodal regression. The min–max and lattice-theoretic characterization of solution structure underlies both uniqueness and optimality. PAVA’s robustness extends to generalized identification functionals and nonconvex penalty structures under careful adaptation, but the underlying assumption of well-structured order constraints is essential for global convergence and statistical guarantees (Jordan et al., 2019, Gagneux et al., 18 Jun 2025).

In summary, PAVA is a central computational and theoretical tool across machine learning, statistical inference, shape-constrained estimation, and convex projection domains, distinguished by its linear complexity, global optimality properties, and versatility in accommodating diverse monotonicity-constrained problems (Brummer et al., 2013, Jordan et al., 2019, Dimitriadis et al., 2020, Ferreira et al., 2021, Gagneux et al., 18 Jun 2025, Suo et al., 2014, Németh et al., 2012, Feelders, 2012, Henzi et al., 2020, Matsuda et al., 2021).