Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Assessing Large Language Models on Climate Information (2310.02932v2)

Published 4 Oct 2023 in cs.CL, cs.AI, cs.CY, and cs.LG

Abstract: As LLMs rise in popularity, it is necessary to assess their capability in critically relevant domains. We present a comprehensive evaluation framework, grounded in science communication research, to assess LLM responses to questions about climate change. Our framework emphasizes both presentational and epistemological adequacy, offering a fine-grained analysis of LLM generations spanning 8 dimensions and 30 issues. Our evaluation task is a real-world example of a growing number of challenging problems where AI can complement and lift human performance. We introduce a novel protocol for scalable oversight that relies on AI Assistance and raters with relevant education. We evaluate several recent LLMs on a set of diverse climate questions. Our results point to a significant gap between surface and epistemological qualities of LLMs in the realm of climate communication.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (102)
  1. Flamingo: a visual language model for few-shot learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  23716–23736. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/960a172bc7fbf0177ccccbb411a7d800-Paper-Conference.pdf.
  2. Concrete problems in AI safety. CoRR, abs/1606.06565, 2016. URL http://arxiv.org/abs/1606.06565.
  3. Palm 2 technical report, 2023.
  4. An instrument for assessing scientists’ written skills in public communication of science. Science Communication, 35(1):56–85, 2013. ISSN 1075-5470. doi: 10.1177/1075547012440634.
  5. Jargon use in public understanding of science papers over three decades. Public Understanding of Science, 29(6):644–654, 2020. ISSN 0963-6625. doi: 10.1177/0963662520940501.
  6. Charles R. Berger. Planning strategic interaction: Attaining goals through communicative action. Routledge, 2020. ISBN 9781003064190. doi: 10.4324/9781003064190.
  7. Information about the human causes of global warming influences causal attribution, concern, and policy support related to global warming. Thinking & Reasoning, 28(3):465–486, 2022.
  8. Cheap talk and cherry-picking: What climatebert has to say on corporate climate risk disclosures. Finance Research Letters, 47, 2022. URL https://www.sciencedirect.com/science/article/pii/S1544612322000897.
  9. Science in the age of large language models. Nature Reviews Physics, 5, 2023. URL https://doi.org/10.1038/s42254-023-00581-4.
  10. Elegant science narratives and unintended influences: An agenda for the science of science communication. Social Issues and Policy Review, 13(1):154–181, 2019. ISSN 17512395. doi: 10.1111/sipr.12055.
  11. Attributed question answering: Evaluation and modeling for attributed large language models, 2023.
  12. Measuring progress on scalable oversight for large language models, 2022.
  13. Is it believable when it’s scientific? how scientific discourse style influences laypeople’s resolution of conflicts. Journal of Research in Science Teaching, 52(1):36–57, 2015.
  14. Catherine F. Brooks. Student identity and aversions to science. Journal of Language and Social Psychology, 36(1):112–126, 2017. ISSN 0261-927X. doi: 10.1177/0261927X16663259.
  15. M. Brown and C. Bruhn. Chapter 11: Information and persuasion. In Baruch Fischhoff, Noel T. Brewer, and Julie S. Downs (eds.), Communicating risks and benefits: An evidence-based user’s guide, pp. 101–109. US Department of Health and Human Services, Washington, D.C., 2011.
  16. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  17. Wändi Bruine de Bruin and Ann Bostrom. Assessing what to address in science communication. Proceedings of the National Academy of Sciences of the United States of America, 110 Suppl 3(Suppl 3):14062–14068, 2013. doi: 10.1073/pnas.1212729110.
  18. Effective communication of uncertainty in the ipcc reports. Climatic Change, 113:181–200, 2012.
  19. Jargon as a barrier to effective science communication: Evidence from metacognition. Public Understanding of Science, 28(7):845–853, 2019. ISSN 0963-6625. doi: 10.1177/0963662519865687.
  20. PaLI: A jointly-scaled multilingual language-image model. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=mWVoBz4W0u.
  21. Can large language models be an alternative to human evaluations? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  15607–15631, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.870. URL https://aclanthology.org/2023.acl-long.870.
  22. The dangers of trusting stochastic parrots: Faithfulness and trust in open-domain conversational question answering. In Findings of the Association for Computational Linguistics: ACL 2023, pp.  947–959, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.60. URL https://aclanthology.org/2023.findings-acl.60.
  23. Effects of consensus messages and political ideology on climate change attitudes: inconsistent findings and the effect of a pretest. Climatic Change, 167(3-4):47, 2021.
  24. Palm: Scaling language modeling with pathways, 2022. URL https://arxiv.org/abs/2204.02311.
  25. Supervising strong learners by amplifying weak experts. CoRR, abs/1810.08575, 2018. URL http://arxiv.org/abs/1810.08575.
  26. Scaling instruction-finetuned language models, 2022. URL https://arxiv.org/abs/2210.11416.
  27. Michael F Dahlstrom. Using narratives and storytelling to communicate science with nonexpert audiences. Proceedings of the national academy of sciences, 111(supplement_4):13614–13620, 2014.
  28. Climate-fever: A dataset for verification of real-world climate claims. In NeurIPS 2020 Workshop on Tackling Climate Change with Machine Learning, 2020.
  29. Chapter 8: Qualitative information. In Baruch Fischhoff, Noel T. Brewer, and Julie S. Downs (eds.), Communicating risks and benefits: An evidence-based user’s guide, pp. 65–75. US Department of Health and Human Services, Washington, D.C., 2011.
  30. Evaluating attribution in dialogue systems: The begin benchmark. Transactions of the Association for Computational Linguistics, 10:1066–1083, 2022.
  31. The science of climate conversations. Social Media + Society, 9(2):20563051231177930, 2023.
  32. Climate of hope or doom and gloom? testing the climate change hope vs. fear communications debate through online videos. Climatic Change, 164(1-2):19, 2021.
  33. A. Fagerlin and E. Peters. Chapter 7: Quantitative information. In Baruch Fischhoff, Noel T. Brewer, and Julie S. Downs (eds.), Communicating risks and benefits: An evidence-based user’s guide, pp. 53–64. US Department of Health and Human Services, Washington, D.C., 2011.
  34. Exploring ’quality’ in science communication online: Expert thoughts on how to assess and promote science communication quality in digital media contexts. Public Understanding of Science, 32(5):605–621, 2023. ISSN 0963-6625. doi: 10.1177/09636625221148054.
  35. The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation, 2023.
  36. Emotionalization in science communication: The impact of narratives and visual representations on knowledge gain and risk perception. Frontiers in Communication, 3:3, 2018.
  37. Chatgpt outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30):e2305016120, 2023. doi: 10.1073/pnas.2305016120. URL https://www.pnas.org/doi/abs/10.1073/pnas.2305016120.
  38. Communicating the scientific consensus on climate change: diverse audiences and effects over time. Environment and Behavior, 54(7-8):1133–1165, 2022.
  39. Katharine Hayhoe. When facts are not enough. Science, 360(6392):943–943, 2018. doi: 10.1126/science.aau2565. URL https://www.science.org/doi/abs/10.1126/science.aau2565.
  40. Made to stick: Why some ideas survive and others die. Random House, 2007.
  41. Trust in science and the science of trust. In Bernd Blöbaum (ed.), Trust and communication in a digitized world, pp.  143–159. Springer, Cham, 2016. ISBN 978-3-319-28059-2.
  42. User comments on climate stories: impacts of anecdotal vs. scientific evidence. Climatic Change, 138(3-4):411–424, 2016. ISSN 0165-0009. doi: 10.1007/s10584-016-1759-1.
  43. Climate assessment moves local. Earth’s Future, 8(2), 2020. ISSN 2328-4277. doi: 10.1029/2019EF001402.
  44. Acknowledging uncertainty impacts public acceptance of climate scientists’ predictions. Nature Climate Change, 9(11):863–867, 2019.
  45. AI safety via debate. CoRR, abs/1805.00899, 2018. URL http://arxiv.org/abs/1805.00899.
  46. The Oxford Handbook of the Science of Science Communication. Oxford University Press, 2017. URL https://doi.org/10.1093/oxfordhb/9780190497620.001.0001.
  47. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023. ISSN 0360-0300. doi: 10.1145/3571730.
  48. Fani Kelesidou and Elodie Chabrol (eds.). A comprehensive guide to Science Communication. Hindawi, 2021.
  49. The ethics of scientific communication under uncertainty. Politics, Philosophy & Economics, 13(4):343–368, 2014. ISSN 1470-594X. doi: 10.1177/1470594X14538570.
  50. Transparent communication of evidence does not undermine public trust in evidence. PNAS nexus, 1(5):pgac280, 2022. doi: 10.1093/pnasnexus/pgac280.
  51. Media effects in the context of environmental issues. In Bruno Takahashi, Julia Metag, Jagadish Thaker, and Suzannah Evans Comfort (eds.), The Handbook of International Trends in Environmental Communication, pp.  31–49. Routledge, New York, 2021. ISBN 9780367275204.
  52. Large language models are state-of-the-art evaluators of translation quality, 2023.
  53. How to communicate science to the public? recommendations for effective written communication derived from a systematic review, Aug 2023. URL psyarxiv.com/cwbrs.
  54. Annie Lang. The limited capacity model of mediated message processing. Journal of Communication, 50(1):46–70, 2000. ISSN 0021-9916. doi: 10.1111/j.1460-2466.2000.tb02833.x.
  55. Predictors of public climate change awareness and risk perception around the world. Nature Climate Change, 5(11):1014–1020, 2015.
  56. Scalable agent alignment via reward modeling: a research direction. CoRR, abs/1811.07871, 2018. URL http://arxiv.org/abs/1811.07871.
  57. Global warming’s six Americas, 2022.
  58. Affective imagery, risk perceptions, and climate change communication. In Anthony Leiserowitz and Nicholas Smith (eds.), Oxford research encyclopedia of climate science. Oxford University Press, Oxford, 2017. ISBN 9780190228620. doi: 10.1093/acrefore/9780190228620.013.307.
  59. Online and (the feeling of being) informed: Online news usage patterns and their relation to subjective and objective political knowledge. Computers in Human Behavior, 103:181–189, 2020. ISSN 07475632. doi: 10.1016/j.chb.2019.08.008.
  60. Stephen C. Levinson. Pragmatics. Cambridge Textbooks in Linguistics. Cambridge University Press, 1983. doi: 10.1017/CBO9780511813313.
  61. Neil A. Lewis Jr. and J. Wai. Communicating what we know and what isn’t so: Science communication in psychology. Perspectives on Psychological Science, 16(6):1242–1254, 2021. doi: 10.1177/1745691620964062.
  62. Evaluating verifiability in generative search engines, 2023.
  63. Multidimensional quality metrics: a flexible system for assessing translation quality. In Proceedings of Translating and the Computer 35, London, UK, November 28-29 2013. Aslib. URL https://aclanthology.org/2013.tc-1.6.
  64. Combatting climate change misinformation: Evidence for longevity of inoculation and consensus messaging effects. Journal of Environmental Psychology, 70, 2020. ISSN 02724944. doi: 10.1016/j.jenvp.2020.101455.
  65. Harnessing the power of communication and behavior science to enhance society’s response to climate change. Annual Review of Earth and Planetary Sciences, 51(1):53–77, 2023. ISSN 0084-6597. doi: 10.1146/annurev-earth-031621-114417.
  66. “cool” communication in the classroom: A preliminary examination of student perceptions of instructor use of positive slang. Qualitative Research Reports in Communication, 9(1):20–28, 2008. ISSN 1745-9435. doi: 10.1080/17459430802400316.
  67. Teaching language models to support answers with verified quotes, 2022. URL https://arxiv.org/abs/2203.11147.
  68. Core skills for effective science communication: A teaching resource for undergraduate science education. International Journal of Science Education, Part B, 7(2):181–201, 2017. ISSN 2154-8455. doi: 10.1080/21548455.2015.1113573.
  69. Ethan Mollick. The dynamics of crowdfunding: An exploratory study. Journal of Business Venturing, 29(1):1–16, 2014. ISSN 08839026. doi: 10.1016/j.jbusvent.2013.06.005.
  70. S. Moser. Reflections on climate change communication research and practice in the second decade of the 21st century: what more is there to say? Wiley Interdisciplinary Reviews: Climate Change 7(3), 345-369, 2016.
  71. Psychological and experiential factors affecting climate change perception: learnings from a transnational empirical study and implications for framing climate-related flood events. Environmental Research Communications, 2(4), 2020. doi: 10.1088/2515-7620/ab89f9.
  72. L. Neuhauser and K. Paul. Chapter 14: Readability, comprehension, and usability. In Baruch Fischhoff, Noel T. Brewer, and Julie S. Downs (eds.), Communicating risks and benefits: An evidence-based user’s guide, pp. 129–148. US Department of Health and Human Services, Washington, D.C., 2011.
  73. Reuters institute digital news report 2021, 2021.
  74. Science communication research: Bridging theory and practice. Washington, DC: American Association for the Advancement of Science, 2016.
  75. The Oxford encyclopedia of climate change communication. Oxford University Press, New York, 2018. ISBN 9780190498986. doi: 10.1093/acref/9780190498986.001.0001.
  76. OpenAI. GPT-4 technical report, 2023.
  77. Learning from and about climate scientists, 2023. URL https://doi.org/10.31234/osf.io/ezua5.
  78. Training language models to follow instructions with human feedback, 2022.
  79. “don’t tell me what to do”: Resistance to climate change messages suggesting behavior changes. Weather, Climate, and Society, 12(4):827–835, 2020.
  80. The social media life of climate change: Platforms, publics and future imaginaries. Wiley interdisciplinary reviews: Climate change, 10(2), e569., 2019.
  81. Climate change remains top global threat across 19-country survey, 2022.
  82. Measuring attribution in natural language generation models, 2022.
  83. Tackling climate change with machine learning. ACM Comput. Surv., 55(2), 2022. URL https://doi.org/10.1145/3485128.
  84. Self-critiquing models for assisting human evaluators, 2022.
  85. Mike S. Schäfer. Introduction to visualizing climate change. In David C. Holmes and Lucy M. Richardson (eds.), Research handbook on communicating climate change, Elgar handbooks in energy, the environment and climate change, pp.  127–130. Edward Elgar Publishing, Cheltenham, UK, 2020. ISBN 9781789900392.
  86. Mike S. Schäfer. The notorious GPT: Science communication in the age of artificial intelligence. Journal of Science Communication, 22(2), 2023. ISSN 1824-2049. doi: 10.22323/2.22020402.
  87. The different audiences of science communication: A segmentation analysis of the swiss population’s perceptions of science and their information and media use patterns. Public Understanding of Science, 27(7):836–856, 2018. doi: 10.1177/0963662517752886. URL https://doi.org/10.1177/0963662517752886.
  88. The seduction of easiness: How science depictions influence laypeople’s reliance on their own evaluation of scientific information. Learning and Instruction, 22(3):231–243, 2012. ISSN 09594752. doi: 10.1016/j.learninstruc.2011.11.004.
  89. The effects of jargon on processing fluency, self-perceptions, and scientific engagement. Journal of Language and Social Psychology, 39(5-6):579–597, 2020. ISSN 0261-927X. doi: 10.1177/0261927X20902177.
  90. Large language models encode clinical knowledge. Nature, pp.  1–9, 2023.
  91. Brian Trench and Massimiano Bucchi (eds.). Routledge handbook of public communication of science and technology. Routledge, Abingdon and New York, 2021. ISBN 9781003039242. doi: 10.4324/9781003039242.
  92. The scientific consensus on climate change as a gateway belief: experimental evidence. PloS One, 10(2):e0118489, 2015. doi: 10.1371/journal.pone.0118489.
  93. ClimaText: A dataset for climate change topic detection. In NeurIPS 2020 Workshop on Tackling Climate Change with Machine Learning, 2020.
  94. Image as a foreign language: BEiT pretraining for all vision and vision-language tasks, 2022.
  95. ClimateBERT: a pretrained language model for climate-related text. In Proceedings of AAAI 2022 Fall Symposium: The Role of AI in Responding to Climate Challenges, 2022.
  96. Ethical and social risks of harm from language models. CoRR, abs/2112.04359, 2021. URL https://arxiv.org/abs/2112.04359.
  97. WHO. 2021 World Health Organization: health and climate change global survey report, 2021.
  98. Improving prescription drug warnings to promote patient comprehension. Archives of internal medicine, 170(1):50–56, 2010. doi: 10.1001/archinternmed.2009.454.
  99. A critical evaluation of evaluations for long-form question answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  3225–3245, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.181. URL https://aclanthology.org/2023.acl-long.181.
  100. Multilingual universal sentence encoder for semantic retrieval. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp.  87–94, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-demos.12. URL https://aclanthology.org/2020.acl-demos.12.
  101. “It’s global warming, stupid”: Aggressive communication styles and political ideology in science blog debates about climate change. Journalism & Mass Communication Quarterly, 97(4):1003–1025, 2020.
  102. The differential impact of statistical and narrative evidence on beliefs, attitude, and intention: A meta-analysis. Health Communication, 30(3):282–289, 2015.
Citations (10)

Summary

We haven't generated a summary for this paper yet.