Consistency-Driven Abstention & Selection

Updated 19 October 2025

Consistency-driven abstention/selection is a framework where models decide to predict or abstain based on uncertainty quantification, ensuring calibrated and risk-aware decisions.
The paper reveals that while the Lovász hinge is inconsistent for non-modular set functions, it achieves consistency with structured abstain loss through calibrated link functions.
Experimental results in structured tasks like semantic segmentation demonstrate that principled abstention improves interpretability and the trade-off between prediction accuracy and uncertainty handling.

Consistency-driven abstention/selection refers to frameworks and algorithms in which a predictive model, learning system, or decision pipeline chooses either to make an explicit prediction/selection or to withhold a decision (“abstain”) based on internal consistency criteria or uncertainty quantification, often to achieve theoretical guarantees of reliability, calibration, and risk minimization. This paradigm is central in selective classification, structured prediction, online learning, and sequential decision-making, especially under distributional or structural uncertainty, data noise, fairness or safety constraints, or ambiguous task specifications.

1. Formalization of Consistency-Driven Abstention in Structured Prediction

The structured abstain problem, as formalized in the context of the Lovász hinge loss, generalizes selective classification to the structured prediction setting with a vector-valued output space. Here, a predictor is allowed to abstain on any subset of $k$ binary predictions rather than being forced to commit to every coordinate. Let $V = \{-1, 0, 1\}^k$ denote the extended report space, where $0$ denotes abstention per coordinate. The target loss for the abstain problem is of the form

$\tilde{\ell}^f(v, y) = f(\{i : v_i \odot y_i < 0\}) + f(\{i : v_i \odot y_i \leq 0\}),$

where $f$ is a submodular set function (e.g., Jaccard), $v$ is the prediction vector, and $y$ is the ground-truth label vector. This target loss penalizes confident misclassifications strictly and imposes additional cost for abstentions. The framework is “consistency driven” in the sense that the predictor seeks to abstain—coordinatewise—if making a confident prediction would be worse, in expectation, than deferral.

2. Inconsistency of the Lovász Hinge for Non-Modular Set Functions

Although the Lovász hinge is a popular convex surrogate in structured prediction (in particular for image segmentation and related tasks where submodular losses arise), it is not consistent for the canonical structured binary classification target when $f$ is genuinely submodular (i.e., not modular). The core inconsistency arises because the Lovász hinge loss on its continuous surrogate variable $u$ does not precisely embed the structured error $\ell^f(r, y) = f(\{i : r_i \odot y_i < 0\})$ when $f$ is non-modular.

Specifically, for any non-modular $f$ , there exist distributions over labels such that the surrogate minimizer $v^* \in V$ necessarily abstains (i.e., $v^*$ has at least one coordinate zero), leading to an optimal surrogate prediction that, once mapped back to binary decisions, is suboptimal under the discrete loss. For modular $f$ , the Lovász hinge embeds the discrete loss exactly and consistency is restored. This concrete separation is rigorously demonstrated in the embedding framework of Finocchiaro et al.

3. The Structured Abstain Problem as a Consistent Surrogate Target

By leveraging the embedding theory for polyhedral surrogates, the structured abstain problem is shown to be the true target for which the Lovász hinge is consistent, even for broad classes of (non-modular) submodular functions. That is, if the model predicts $v \in V$ , optimizing the Lovász hinge as a surrogate yields consistency with respect to the structured abstain loss $\tilde{\ell}^f$ rather than the original full-decision loss.

In this framework, abstention is principled: the model avoids predicting on uncertain or ambiguous coordinates, and the total loss is minimized when the trade-off between abstaining and potentially misclassifying is optimal in expectation. Consistency in this setting means that—given a proper calibrated link—the surrogate minimization produces predictions that are minimizers of $\tilde{\ell}^f$ .

4. Calibrated Link Functions for Polyhedral Surrogates

A critical technical ingredient is the construction of calibrated link functions that map continuous surrogate outputs $u \in [-1, 1]^k$ to abstain-augmented discrete reports $v \in V$ in a manner that restores consistency across all polymatroids (i.e., normalized, increasing submodular set functions). Using the embedding framework, the authors define a family of link functions via an $ε$ -thickened envelope around the vertices of the surrogate polytope—the so-called "common link envelope." This construction is independent of the particular $f$ , and pointwise inclusion in the envelope is sufficient to guarantee calibration.

Consequently, for any $u$ , one selects the vertex or face of the hypercube (in the polyhedral decomposition) that is within $ε$ of $u$ in sup-norm, and one of several explicit link rules (e.g., based on largest/smallest gap in ordered $|u|$ coordinates) yields a mapping that is simultaneously consistent for all polymatroids. Sufficient conditions for tight embedding—where each possible abstain pattern corresponds to exactly one predicted output—are given in terms of the structure of the polymatroid.

5. Experimental Demonstrations and Interpretability

Empirical studies in structured prediction tasks (notably semantic segmentation with submodular losses like Jaccard) confirm that the structured abstain loss is operationally meaningful. Abstention events (predicted zeros in $v$ ) tend to align with spatial regions of high uncertainty or label noise, affording the following advantages:

Interpretability: Abstention patterns can be visualized and interpreted as model uncertainty, especially in ambiguous image regions.
Coverage–Performance Tradeoffs: By tuning the threshold parameter in the link, one can adjust the overall abstention rate, trading off non-abstained accuracy versus model conservativeness.
No Redundant Targets: Under tight embedding, each abstain pattern plays a distinct role, allowing fine-grained control for downstream decision-makers.

Standard performance measures such as rejection rate, accuracy on non-abstained coordinates, and overall Intersection-over-Union (IoU) are reported to quantify these trade-offs.

6. Multiclass Generalization via Binary Encodings

While the main construction is for binary structured prediction, a multiclass generalization is developed by combining the binary encoding framework of Ramaswamy et al. (2018) with the polyhedral link construction. Categorical labels are mapped to blocks of binary codes, and abstention can be enacted blockwise or at the level of bits within blocks. The surrogate is defined over $\mathbb{R}^{d \cdot k}$ , with $d = O(\log_2 |\mathcal{C}|)$ , and consistency is retained with respect to a “natural multiclass generalization” of the structured abstain problem.

A practical challenge is that not all partial abstain patterns may be semantically meaningful in the multiclass case. The theory shows that dominated or nonsensical partial patterns can, under certain conditions, be pruned or are subsumed by more natural abstain configurations (e.g., entire blocks for multiclass labels).

7. Connections to Selective Classification, Property Elicitation, and Risk-Averse Learning

The structured abstain loss generalizes and unifies several well-studied paradigms:

Selective Classification: Extends classical selective rejection strategies to structured prediction, where abstention is localized over outputs.
Property Elicitation: Embedding theory underpins the construction of proper links for surrogates, crucial for ensuring the existence of consistent minimizers.
Risk-Sensitive Learning: Abstaining provides a mechanism for controlling risk in ambiguous regions (e.g., noisy labels or uninformative features), which is directly generalizable to high-dimensional, structured, or even multiclass output spaces.

Summary Table: Consistency of the Lovász Hinge Across Set Functions

Set function class	Consistent for standard loss?	Consistent for structured abstain loss?
Modular	Yes	Yes
Polymatroid (submodular)	No	Yes (with calibrated link)
Non-submodular	No	No* (embedding theory may not apply)

*Note: Embedding and calibration results generally assume $f$ is a polymatroid.

In conclusion, the paradigm of “consistency-driven abstention/selection” in structured prediction—specifically via the Lovász hinge and calibrated link constructions—enables principled, interpretable, and risk-aware decision making in high-dimensional and multiclass tasks. The framework provides both a theoretical and empirical foundation for abstention-centered selective prediction, and paves the way for extensions to broader structured and multiclass settings while elucidating the inherent limitations of standard convex surrogates when used with non-modular evaluation functions (Finocchiaro et al., 9 May 2025, Finocchiaro et al., 2022).