Principled Approaches for Learning to Defer with Multiple Experts (2310.14774v2)
Abstract: We present a study of surrogate losses and algorithms for the general problem of learning to defer with multiple experts. We first introduce a new family of surrogate losses specifically tailored for the multiple-expert setting, where the prediction and deferral functions are learned simultaneously. We then prove that these surrogate losses benefit from strong $H$-consistency bounds. We illustrate the application of our analysis through several examples of practical surrogate losses, for which we give explicit guarantees. These loss functions readily lead to the design of new learning to defer algorithms based on their minimization. While the main focus of this work is a theoretical analysis, we also report the results of several experiments on SVHN and CIFAR-10 datasets.
- Budget learning via bracketing. In International Conference on Artificial Intelligence and Statistics, pages 4109–4119, 2020.
- Calibration and consistency of adversarial surrogate losses. In Advances in Neural Information Processing Systems, 2021a.
- On the existence of the adversarial bayes classifier. In Advances in Neural Information Processing Systems, pages 2978–2990, 2021b.
- A finer calibration analysis for adversarial robustness. arXiv preprint arXiv:2105.01550, 2021c.
- Multi-class ℋℋ{\mathscr{H}}script_H-consistency bounds. In Advances in neural information processing systems, 2022a.
- ℋℋ{\mathscr{H}}script_H-consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning, 2022b.
- Theoretically grounded loss functions and algorithms for adversarial robustness. In International Conference on Artificial Intelligence and Statistics, pages 10077–10094, 2023.
- DC-programming for neural network optimizations. Journal of Global Optimization, pages 1–17, 2024.
- Is the most accurate ai the best teammate? optimizing ai for teamwork. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11405–11414, 2021.
- Classification with a reject option using a hinge loss. Journal of Machine Learning Research, 9(8), 2008.
- Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.
- Nina L Corvelo Benz and Manuel Gomez Rodriguez. Counterfactual inference of second opinions. In Uncertainty in Artificial Intelligence, pages 453–463. PMLR, 2022.
- Joseph Berkson. Application of the logistic function to bio-assay. Journal of the American Statistical Association, 39:357––365, 1944.
- Joseph Berkson. Why I prefer logits to probits. Biometrics, 7(4):327––339, 1951.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
- Generalizing consistent multi-class classification with rejection to be compatible with arbitrary losses. In Advances in neural information processing systems, 2022.
- In defense of softmax parametrization for calibrated and consistent learning to defer. In Advances in Neural Information Processing Systems, 2023.
- Classification with rejection based on cost-sensitive classification. In International Conference on Machine Learning, pages 1507–1517, 2021.
- Sample efficient learning of predictors that complement humans. In International Conference on Machine Learning, pages 2972–3005, 2022.
- Learning to make adherence-aware advice. In International Conference on Learning Representations, 2024.
- Regression with cost-based rejection. In Advances in Neural Information Processing Systems, 2023.
- C Chow. On optimum recognition error and reject tradeoff. IEEE Transactions on information theory, 16(1):41–46, 1970.
- C.K. Chow. An optimum character recognition system using decision function. IEEE T. C., 1957.
- Learning with rejection. In International Conference on Algorithmic Learning Theory, pages 67–82, 2016a.
- Boosting with abstention. In Advances in Neural Information Processing Systems, pages 1660–1668, 2016b.
- Theory and algorithms for learning with rejection in binary classification. Annals of Mathematics and Artificial Intelligence, pages 1–39, 2023.
- Regression under human assistance. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2611–2620, 2020.
- Ran El-Yaniv et al. On the foundations of noise-free selective classification. Journal of Machine Learning Research, 11(5), 2010.
- Selective classification via one-sided prediction. In International Conference on Artificial Intelligence and Statistics, pages 2179–2187, 2021.
- Human-ai collaboration with bandit feedback. arXiv preprint arXiv:2105.10614, 2021.
- Selective classification for deep neural networks. In Advances in neural information processing systems, 2017.
- Selectivenet: A deep neural network with an integrated reject option. In International conference on machine learning, pages 2151–2159, 2019.
- Support vector machines with a reject option. In Advances in neural information processing systems, 2008.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Forming effective human-ai teams: Building machine learning models that complement the capabilities of multiple experts. arXiv preprint arXiv:2206.07948, 2022.
- Learning to defer with limited expert predictions. arXiv preprint arXiv:2304.07306, 2023.
- Classification with reject option. Can. J. Stat., 2005.
- Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty. arXiv preprint arXiv:2109.06312, 2021.
- Reliable agnostic learning. Journal of Computer and System Sciences, 78(5):1481–1495, 2012.
- Combining human and machine intelligence in large-scale crowdsourcing. In AAMAS, pages 467–474, 2012.
- Combining human predictions with model probabilities via confusion matrices and calibration. Advances in Neural Information Processing Systems, 34:4421–4434, 2021.
- Towards unbiased and accurate deferral to multiple experts. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 154–165, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Human decisions and machine predictions. The quarterly journal of economics, 133(1):237–293, 2018.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Toronto University, 2009.
- Multi-class deep boosting. In Advances in Neural Information Processing Systems, pages 2501–2509, 2014.
- Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, 99(465):67–81, 2004.
- When no-rejection learning is optimal for regression with rejection. In International Conference on Artificial Intelligence and Statistics, 2024.
- Incorporating uncertainty in learning to defer algorithms for safe computer-aided diagnosis. Scientific reports, 12(1):1762, 2022.
- Consistency versus realizable H-consistency for multiclass classification. In International Conference on Machine Learning, pages 801–809, 2013.
- Learning adversarially fair and transferable representations. arXiv preprint arXiv:1802.06309, 2018.
- Two-stage learning to defer with multiple experts. In Advances in Neural Information Processing Systems, 2023a.
- H-consistency bounds: Characterization and extensions. In Advances in Neural Information Processing Systems, 2023b.
- H-consistency bounds for pairwise misranking loss surrogates. In International conference on Machine learning, 2023c.
- Ranking with abstention. In ICML 2023 Workshop The Many Facets of Preference-Based Learning, 2023d.
- Structured prediction with stronger consistency guarantees. In Advances in Neural Information Processing Systems, 2023e.
- Cross-entropy loss functions: Theoretical analysis and applications. In International Conference on Machine Learning, 2023f.
- Predictor-rejector multi-class abstention: Theoretical analysis and algorithms. In International Conference on Algorithmic Learning Theory, 2024a.
- Theoretically grounded loss functions and algorithms for score-based multi-class abstention. In International Conference on Artificial Intelligence and Statistics, 2024b.
- ℋℋ{\mathscr{H}}script_H-consistency guarantees for regression. arXiv preprint arXiv:2403.19480, 2024c.
- Regression with multi-expert deferral. arXiv preprint arXiv:2403.19494, 2024d.
- Top-k𝑘kitalic_k classification and cardinality-aware prediction. arXiv preprint arXiv:2403.19625, 2024e.
- Learning to reject with a fixed predictor: Application to decontextualization. In International Conference on Learning Representations, 2024.
- Foundations of Machine Learning. MIT Press, second edition, 2018.
- Consistent estimators for learning to defer to an expert. In International Conference on Machine Learning, pages 7076–7087, 2020.
- Teaching humans when to defer to a classifier via exemplars. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 5323–5331, 2022.
- Post-hoc estimators for learning to defer to an expert. In Advances in Neural Information Processing Systems, pages 29292–29304, 2022.
- Learning to reject meets ood detection: Are all abstentions created equal? arXiv preprint arXiv:2301.12386, 2023.
- Reading digits in natural images with unsupervised feature learning. In Advances in Neural Information Processing Systems, 2011.
- On the calibration of multiclass classification with rejection. In Advances in Neural Information Processing Systems, pages 2582–2592, 2019.
- Differentiable learning under triage. Advances in Neural Information Processing Systems, 34:9140–9151, 2021.
- Preferential mixture-of-experts: Interpretable models that rely on human expertise as much as possible. AMIA Summits on Translational Science Proceedings, 2021:525, 2021.
- The algorithmic automation problem: Prediction, triage, and human effort. arXiv preprint arXiv:1903.12220, 2019.
- Improving learning-to-defer algorithms through fine-tuning. arXiv preprint arXiv:2112.10768, 2021.
- Consistent algorithms for multiclass classification with an abstain option. Electronic Journal of Statistics, 12(1):530–554, 2018.
- Ingo Steinwart. How to compare different loss functions and their risks. Constructive Approximation, 26(2):225–287, 2007.
- Reinforcement learning under algorithmic triage. arXiv preprint arXiv:2109.11328, 2021.
- Provably improving expert predictions with conformal prediction. arXiv preprint arXiv:2201.12006, 2022.
- Investigating human+ machine complementarity for recidivism predictions. arXiv preprint arXiv:1808.09123, 2018.
- Pierre François Verhulst. Notice sur la loi que la population suit dans son accroissement. Correspondance mathématique et physique, 10:113––121, 1838.
- Pierre François Verhulst. Recherches mathématiques sur la loi d’accroissement de la population. Nouveaux Mémoires de l’Académie Royale des Sciences et Belles-Lettres de Bruxelles, 18:1––42, 1845.
- Calibrated learning to defer with one-vs-all classifiers. In International Conference on Machine Learning, pages 22184–22202, 2022.
- Learning to defer to multiple experts: Consistent surrogate losses, confidence calibration, and conformal ensembles. In International Conference on Artificial Intelligence and Statistics, pages 11415–11434, 2023.
- Emergent abilities of large language models. CoRR, abs/2206.07682, 2022.
- Multi-class support vector machines. Technical report, Citeseer, 1998.
- Agnostic selective classification. In Advances in neural information processing systems, 2011.
- Learning to complement humans. In International Joint Conferences on Artificial Intelligence, pages 1526–1533, 2021.
- Classification methods with reject option based on convex risk minimization. Journal of Machine Learning Research, 11(1), 2010.
- SVMs with a reject option. In Bernoulli, 2011.
- Bayes consistency vs. H-consistency: The interplay between surrogate loss functions and the scoring function class. In Advances in Neural Information Processing Systems, 2020.
- Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32(1):56–85, 2004.
- Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in neural information processing systems, 2018.
- Directing human attention in event localization for clinical timeline creation. In Machine Learning for Healthcare Conference, pages 80–102, 2021.
- Revisiting discriminative vs. generative classifiers: Theory and implications. arXiv preprint arXiv:2302.02334, 2023.
- Deep gamblers: Learning to abstain with portfolio theory. arXiv preprint arXiv:1907.00208, 2019.