A tutorial on conformal prediction (0706.3188v1)

Published 21 Jun 2007 in cs.LG and stat.ML

Abstract: Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability $\epsilon$, together with a method that makes a prediction $\hat{y}$ of a label $y$, it produces a set of labels, typically containing $\hat{y}$, that also contains $y$ with probability $1-\epsilon$. Conformal prediction can be applied to any method for producing $\hat{y}$: a nearest-neighbor method, a support-vector machine, ridge regression, etc. Conformal prediction is designed for an on-line setting in which labels are predicted successively, each one being revealed before the next is predicted. The most novel and valuable feature of conformal prediction is that if the successive examples are sampled independently from the same distribution, then the successive predictions will be right $1-\epsilon$ of the time, even though they are based on an accumulating dataset rather than on independent datasets. In addition to the model under which successive examples are sampled independently, other on-line compression models can also use conformal prediction. The widely used Gaussian linear model is one of these. This tutorial presents a self-contained account of the theory of conformal prediction and works through several numerical examples. A more comprehensive treatment of the topic is provided in "Algorithmic Learning in a Random World", by Vladimir Vovk, Alex Gammerman, and Glenn Shafer (Springer, 2005).

Citations (970)

View on Semantic Scholar

Summary

The paper introduces conformal prediction as a distribution-free method to generate valid prediction regions using past data and nonconformity measures.
It explains both theory and practice with examples like Iris classification and petal width regression that illustrate varying prediction efficiencies.
The tutorial extends the framework to online compression models and exchangeability principles, bolstering its robustness for diverse machine learning tasks.

Overview of a Tutorial on Conformal Prediction

The tutorial paper by Glenn Shafer and Vladimir Vovk provides a detailed exploration of conformal prediction, a statistical technique designed to quantify the uncertainty in machine learning predictions. Unlike traditional methods, conformal prediction offers a distribution-free way to provide confidence levels for predictions using past experiences.

Key Concepts and Definitions

At its core, conformal prediction depends on two main constructs:

Prediction Regions: Given a sequence of examples and a new observation, conformal prediction generates prediction regions with a predefined confidence level. These regions are sets that contain the true label of the new observation with a specified probability (e.g., 95%).
Nonconformity Measures: These measures assess how different a new observation is from previous ones. The choice of nonconformity measure significantly affects the efficiency and validity of the prediction regions.

Methodological Framework

The tutorial distinguishes two scenarios where conformal prediction can be applied:

Prediction from Old Examples Alone: Here, prediction is based purely on past examples without considering new instance features.
Prediction Using Features of New Objects: In this scenario, predictions are made considering features of new instances along with past examples.

In both cases, the conformal algorithm operates by testing every possible new observation "z" or label "y" and determining whether it should be included in the prediction region based on whether its nonconformity score places it within a user-defined probability threshold (e.g., 5%).

Theoretical Underpinnings and Validity

The paper underscores the validity of conformal predictors through the notions of exchangeability and on-line compression models. Under exchangeability, successive instances are assumed to be equally probable in any order. This leads to the probability of prediction regions being calibrated correctly over time. Specifically, the paper demonstrates:

Exact Independence under Exchangeability: By breaking down the steps into n-events and proving mutual independence, the classical weak law of large numbers was shown to apply.
Game-Theoretic Approach: This emphasizes Cournot's principle to guarantee that a bettor cannot multiply their capital significantly over many trials, thereby providing a form of probabilistic robustness and law of large numbers.

Practical Illustrations

The tutorial provides concrete examples to elucidate the theory:

Classifying Iris Flowers: Using Fisher’s Iris dataset, the paper demonstrates three different nonconformity measures (nearest neighbor, distance to species average, and support vector machines) to predict the species of a new flower. The resulting prediction regions show how different measures can lead to varying confidence levels and prediction efficiencies.
Predicting Petal Width: Here, the prediction of a numerical attribute (petal width) from another numerical attribute (sepal length) is illustrated. Using linear regression, the conformal prediction intervals generated are comparable to classical statistical intervals but extend the applicability to non-normal distributions.

Extensions to On-line Compression Models

The paper extends the conformal prediction paradigm to more general scenarios using on-line compression models. Two key models explored are:

Exchangeability-Within-Label Model: This relaxes the assumption of global exchangeability to within-class exchangeability, thus providing better-calibrated predictions within each class.
On-line Gaussian Linear Model: Here, the classical Gaussian linear model's assumptions are adapted to an on-line context, with conformal prediction producing exact classical prediction intervals.

Implications and Future Directions

The conformal prediction framework has significant implications for both theoretical and applied machine learning:

Theoretical Robustness: By avoiding strong model assumptions and relying on exchangeability, conformal prediction offers a more robust framework, particularly useful in non-parametric settings or when traditional model assumptions (e.g., normality) are untenable.
Practical Use: Practically, conformal prediction provides practitioners with a powerful tool to wrap around existing prediction algorithms, offering calibrated and interpretable confidence intervals, which is invaluable in critical decision-making domains such as healthcare, finance, and autonomous systems.

Conclusion

Shafer and Vovk's tutorial on conformal prediction thoroughly encapsulates the method's theoretical underpinnings, practical applications, and potential extensions. By focusing on producing valid, distribution-free prediction intervals, conformal prediction stands out as a versatile and robust approach to uncertainty quantification in machine learning, pointing the way to future developments in creating highly reliable AI systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/AlexCampolo/status/1828733226552111195