PAC Prediction Sets for Large Language Models of Code (2302.08703v2)
Abstract: Prediction sets have recently been shown to be a promising strategy for quantifying the uncertainty of deep neural networks in a way that provides theoretical guarantees. However, existing techniques have largely targeted settings where the space of labels is simple, so prediction sets can be arbitrary subsets of labels. For structured prediction problems where the space of labels is exponential in size, even prediction sets containing a small fraction of all labels can be exponentially large. In the context of code generation, we propose a solution that considers a restricted set of prediction sets that can compactly be represented as partial programs, which are programs with portions replaced with holes. Given a trained code generation model, our algorithm leverages a programming language's abstract syntax tree to generate a set of programs such that the correct program is in the set with high-confidence. Valuable applications of our algorithm include a Codex-style code generator with holes in uncertain parts of the generated code, which provides a partial program with theoretical guarantees. We evaluate our approach on PICARD (a T5 model for SQL semantic parsing) and Codex (a GPT model for over a dozen programming languages, including Python), demonstrating that our approach generates compact PAC prediction sets. This is the first research contribution that generates PAC prediction sets for generative code models.
- A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 76:243–297, 2021. ISSN 1566-2535. doi: https://doi.org/10.1016/j.inffus.2021.05.008. URL https://www.sciencedirect.com/science/article/pii/S1566253521001081.
- Uncertainty sets for image classifiers using conformal prediction. arXiv preprint arXiv:2009.14193, 2020.
- Distribution-free, risk-controlling prediction sets, 2021. URL https://arxiv.org/abs/2101.02703.
- Language models are few-shot learners, 2020. URL https://arxiv.org/abs/2005.14165.
- Evaluating large language models trained on code, 2021. URL https://arxiv.org/abs/2107.03374.
- Learning to complete code with sketches. In International Conference on Learning Representations, 2021.
- Measuring coding challenge competence with apps. NeurIPS, 2021.
- The survey: Text generation models in deep learning. Journal of King Saud University - Computer and Information Sciences, 34(6, Part A):2515–2528, 2022. ISSN 1319-1578. doi: https://doi.org/10.1016/j.jksuci.2020.04.001. URL https://www.sciencedirect.com/science/article/pii/S1319157820303360.
- Autoregressive structured prediction with language models, 2022. URL https://arxiv.org/abs/2210.14698.
- Pac confidence sets for deep neural networks via calibrated prediction, 2020. URL https://arxiv.org/abs/2001.00106.
- Exploring the limits of transfer learning with a unified text-to-text transformer, 2019. URL https://arxiv.org/abs/1910.10683.
- Picard: Parsing incrementally for constrained auto-regressive decoding from language models, 2021. URL https://arxiv.org/abs/2109.05093.
- Structured prediction for object detection in deep neural networks. URL https://www.ais.uni-bonn.de/papers/icann2014_schulz.pdf.
- Neural machine translation of rare words with subword units, 2015. URL https://arxiv.org/abs/1508.07909.
- Solar-Lezama, A. Program synthesis by sketching. University of California, Berkeley, 2008.
- Sequence to sequence learning with neural networks, 2014. URL https://arxiv.org/abs/1409.3215.
- Conformal prediction under covariate shift. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/8fb21ee7a2207526da55a679f0332de2-Paper.pdf.
- Algorithmic Learning in a Random World. Springer-Verlag, Berlin, Heidelberg, 2005. ISBN 0387001522.
- Wilks, S. S. Determination of Sample Sizes for Setting Tolerance Limits. The Annals of Mathematical Statistics, 12(1):91 – 96, 1941. doi: 10.1214/aoms/1177731788. URL https://doi.org/10.1214/aoms/1177731788.
- Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task, 2018. URL https://arxiv.org/abs/1809.08887.