ClimateX: Do LLMs Accurately Assess Human Expert Confidence in Climate Statements? (2311.17107v1)
Abstract: Evaluating the accuracy of outputs generated by LLMs is especially important in the climate science and policy domain. We introduce the Expert Confidence in Climate Statements (ClimateX) dataset, a novel, curated, expert-labeled dataset consisting of 8094 climate statements collected from the latest Intergovernmental Panel on Climate Change (IPCC) reports, labeled with their associated confidence levels. Using this dataset, we show that recent LLMs can classify human expert confidence in climate-related statements, especially in a few-shot learning setting, but with limited (up to 47%) accuracy. Overall, models exhibit consistent and significant over-confidence on low and medium confidence statements. We highlight implications of our results for climate communication, LLMs evaluation strategies, and the use of LLMs in information retrieval systems.
- Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc., 2009.
- Assessing Large Language Models on climate information, 2023.
- Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4, 2023.
- Cohere. Cohere’s Command Model, 2023.
- The PyPDF2 library, 2022.
- Universal language model fine-tuning for text classification, 2018.
- Scott Janzwood. Confident, likely, or both? The implementation of the uncertainty language framework in IPCC special reports. Climatic Change, 162(3):1655–1675, October 2020.
- Language Models (Mostly) Know What They Know, 2022.
- Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP, 2023.
- Expert Confidence in Climate Statements (ClimateX) dataset. https://huggingface.co/datasets/rlacombe/ClimateX, 2023.
- ClimaBench: A Benchmark Dataset For Climate Change Text Understanding in English, 2023.
- Teaching Models to Express Their Uncertainty in Words, 2022.
- Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, 2021.
- Guidance Note for Lead Authors of the IPCC Fifth Assessment Report on Consistent Treatment of Uncertainties, 2010.
- The Onion. The Onion: America’s Finest News Source, 2023.
- OpenAI. Models, 2023.
- Climate Change 2022: Impacts, Adaptation and Vulnerability. Contribution of Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge, UK and New York, USA, 2022.
- Climate Change 2022: Mitigation of Climate Change. Contribution of Working Group III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge, UK and New York, USA, 2022.
- B. Thomas. Onion News Articles Dataset, 2023.
- FEVER: a large-scale dataset for fact extraction and VERification. In NAACL-HLT, 2018.
- chatClimate: Grounding conversational ai in climate science, 2023.
- Climatext: A dataset for climate change topic detection, 2021.
- ReCOGS: How incidental details of a logical form overshadow an evaluation of semantic interpretation, 2023.
- Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models, 2023.