Conformalization of Sparse Generalized Linear Models (2307.05109v1)
Abstract: Given a sequence of observable variables ${(x_1, y_1), \ldots, (x_n, y_n)}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.
- Numerical continuation methods: an introduction. Springer Science & Business Media, 2012.
- Computing regularization paths for learning multiple kernels. Advances in neural information processing systems, 2004.
- Conformal prediction for reliable machine learning: theory, adaptations and applications. Elsevier, 2014.
- Testing for outliers with conformal p-values. arXiv preprint arXiv:2104.08279, 2021.
- Beyond l1: Faster and better sparse models with skglm. In NeurIPS, 2022.
- Breiman, L. Bagging predictors. Machine learning, 24(2):123–140, 1996.
- The concept of exchangeability in ensemble forecasting. Nonlinear Processes in Geophysics, 2011.
- Valid distribution-free inferential models for prediction. arXiv preprint arXiv:2001.09225, 2020.
- Linex loss functions with applications to determining the optimum process parameters. Quality & Quantity, 2007.
- Exact and robust conformal inference methods for predictive machine learning with dependent data. Conference On Learning Theory, 2018.
- An exact and robust conformal inference method for counterfactual and synthetic controls. Journal of the American Statistical Association, 2021.
- CVXPY: A Python-embedded modeling language for convex optimization. J. Mach. Learn. Res, 2016.
- Few-shot conformal prediction with auxiliary tasks. ICML, 2021.
- An homotopy algorithm for the lasso with online observations. In Advances in neural information processing systems, pp. 489–496, 2009.
- Gruber, M. Regression estimators: A comparative study. JHU Press, 2010.
- H., F. J. Multivariate Adaptive Regression Splines. The Annals of Statistics, 1991.
- The entire regularization path for the support vector machine. J. Mach. Learn. Res, 2004.
- Query by transduction. IEEE transactions on pattern analysis and machine intelligence, 2008.
- Holland, M. J. Making learning more transparent using conformalized performance prediction. arXiv preprint arXiv:2007.04486, 2020.
- Inductive conformal anomaly detection for sequential detection of anomalous sub-trajectories. Annals of Mathematics and Artificial Intelligence, 2015.
- Lei, J. Fast exact conformalization of lasso using piecewise linear homotopy. Biometrika, 2019.
- Conformal prediction intervals with temporal dependence. Transactions of Machine Learning Research, 2022.
- Complexity analysis of the lasso regularization path. ICML, 2012.
- Ndiaye, E. Stable conformal prediction sets. In International Conference on Machine Learning. PMLR, 2022.
- Computing full conformal prediction set with approximate homotopy. NeurIPS, 2019.
- Root-finding approaches for computing conformal prediction set. arXiv preprint arXiv:2104.06648, 2021a.
- Continuation path with linear convergence rate. arXiv e-prints, pp. arXiv–2112, 2021b.
- L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2007.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Piecewise linear regularized solution paths. The Annals of Statistics, 2007.
- A tutorial on conformal prediction. Journal of Machine Learning Research, 2008.
- Tibshirani, R. J. The lasso problem and uniqueness. Electronic Journal of Statistics, 2013.
- Algorithmic learning in a random world. Springer, 2005.
- Conformal prediction interval for dynamic time-series. ICML, 2021.