ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate Change (2401.09646v1)
Abstract: This paper introduces ClimateGPT, a model family of domain-specific LLMs that synthesize interdisciplinary research on climate change. We trained two 7B models from scratch on a science-oriented dataset of 300B tokens. For the first model, the 4.2B domain-specific tokens were included during pre-training and the second was adapted to the climate domain after pre-training. Additionally, ClimateGPT-7B, 13B and 70B are continuously pre-trained from Llama~2 on a domain-specific dataset of 4.2B tokens. Each model is instruction fine-tuned on a high-quality and human-generated domain-specific dataset that has been created in close cooperation with climate scientists. To reduce the number of hallucinations, we optimize the model for retrieval augmentation and propose a hierarchical retrieval strategy. To increase the accessibility of our model to non-English speakers, we propose to make use of cascaded machine translation and show that this approach can perform comparably to natively multilingual models while being easier to scale to a large number of languages. Further, to address the intrinsic interdisciplinary aspect of climate change we consider different research perspectives. Therefore, the model can produce in-depth answers focusing on different perspectives in addition to an overall answer. We propose a suite of automatic climate-specific benchmarks to evaluate LLMs. On these benchmarks, ClimateGPT-7B performs on par with the ten times larger Llama-2-70B Chat model while not degrading results on general domain benchmarks. Our human evaluation confirms the trends we saw in our benchmarks. All models were trained and evaluated using renewable energy and are released publicly.
- GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. ArXiv preprint, abs/2305.13245.
- The Falcon Series of Open Language Models.
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University. In Proceedings of the 17th International Conference on Spoken Language Translation, pages 44–54, Online. Association for Computational Linguistics.
- Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
- A theory of learning from different domains. Machine learning, 79:151–175.
- Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR.
- PIQA: Reasoning about Physical Commonsense in Natural Language. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 7432–7439. AAAI Press.
- The Foundation Model Transparency Index. ArXiv preprint, abs/2310.12941.
- A.Z. Broder. 1997. On the resemblance and containment of documents. In Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171), pages 21–29.
- Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Assessing Large Language Models on Climate Information.
- epfLLM Megatron-LM.
- Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1870–1879, Vancouver, Canada. Association for Computational Linguistics.
- MEDITRON-70B: Scaling Medical Pretraining for Large Language Models.
- Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. Blog post.
- Improving Zero-shot Reader by Reducing Distractions from Irrelevant Documents in Open-Domain Question Answering. volume abs/2310.17490.
- Computer-assisted classification of contrarian claims about climate change. Scientific reports, 11(1):22320.
- Free Dolly: Introducing the World’s First Truly Open Instruction-Tuned LLM.
- Elastic Weight Removal for Faithful and Abstractive Dialogue Generation.
- Controllable Factuality in Document-Grounded Dialog Systems Using a Noisy Channel Model. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1365–1381, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Training Neural Machine Translation to Apply Terminology Constraints. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3063–3068, Florence, Italy. Association for Computational Linguistics.
- Interconnected Disaster Risks 2023: Risk Tipping Points.
- Climate Change 2014: Mitigation of Climate Change. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA.
- The pile: An 800gb dataset of diverse text for language modeling. ArXiv preprint, abs/2101.00027.
- A framework for few-shot language model evaluation.
- Gemini Team, Google. 2023. Gemini: A Family of Highly Capable Multimodal Models.
- The False Promise of Imitating Proprietary LLMs.
- Accelerating Large-Scale Inference with Anisotropic Vector Quantization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 3887–3896. PMLR.
- Continual Pre-Training of Large Language Models: How to re-warm your model? In Workshop on Efficient Systems for Foundation Models @ ICML2023.
- REALM: Retrieval-Augmented Language Model Pre-Training. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org.
- EXAMS: A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5427–5444, Online. Association for Computational Linguistics.
- Measuring Massive Multitask Language Understanding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- Towards Climate Awareness in NLP Research. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2480–2494, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- An empirical analysis of compute-optimal large language model training. In Advances in Neural Information Processing Systems.
- Mistral 7B.
- Mixtral of Experts.
- Scaling Laws for Neural Language Models.
- UDALM: Unsupervised Domain Adaptation through Language Modeling. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2579–2590, Online. Association for Computational Linguistics.
- Nearest Neighbor Machine Translation. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1246–1257, Florence, Italy. Association for Computational Linguistics.
- Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
- OpenAssistant Conversations - Democratizing Large Language Model Alignment. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- How the Foundation Model Transparency Index Distorts Transparency. blog.eleuther.ai/.
- BioBERT: a pre-trained biomedical language representation model for biomedical text minings. Bioinformatics, 36(4), 1234-1240.
- Markus Leippold and Thomas Diggelmann. 2020. Climate-FEVER: A Dataset for Verification of Real-World Climate Claims. In NeurIPS 2020 Workshop on Tackling Climate Change with Machine Learning.
- Markus Leippold and Francesco Saverio Varini. 2020. ClimaText: A Dataset for Climate Change Topic Detection. In NeurIPS 2020 Workshop on Tackling Climate Change with Machine Learning.
- Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Generating Wikipedia by Summarizing Long Sequences. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning. ArXiv preprint, abs/2301.13688.
- BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics 23, no. 6.
- Training Millions of Personalized Dialogue Agents. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2775–2779, Brussels, Belgium. Association for Computational Linguistics.
- Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2381–2391, Brussels, Belgium. Association for Computational Linguistics.
- Memory-Based Model Editing at Scale. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 15817–15831. PMLR.
- Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 220–229, New York, NY, USA. Association for Computing Machinery.
- MTEB: Massive Text Embedding Benchmark. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2014–2037, Dubrovnik, Croatia. Association for Computational Linguistics.
- Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM. In EMNLP 2023.
- Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’21, New York, NY, USA. Association for Computing Machinery.
- OpenAI. 2023. GPT-4 Technical Report.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Training language models to follow instructions with human feedback. In NeurIPS.
- R.K. Pachauri and L.A. Meyer, editors. 2014. Climate Change 2014: Synthesis Report. Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. IPCC, Geneva, Switzerland. 151 pp.
- Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
- The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. ArXiv preprint, abs/2306.01116.
- A Study of Generative Large Language Model for Medical Research and Healthcare. ArXiv preprint, abs/2305.13523.
- Benchmarks for Pir\\\backslash\’a 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change. ArXiv preprint, abs/2309.10945.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
- WinoGrande: An Adversarial Winograd Schema Challenge at Scale. Commun. ACM, 64(9):99–106.
- Bloom: A 176b-parameter open-access multilingual language model. ArXiv preprint, abs/2211.05100.
- Annex III: Technology-specific cost and performance parameters, pages 1329–1356. Cambridge University Press, United Kingdom. This annex should be cited as: Schlömer S., T. Bruckner, L. Fulton, E. Hertwich, A. McKinnon, D. Perczyk, J. Roy, R. Schaeffer, R. Sims, P. Smith, and R. Wiser, 2014: Annex III: Technology-specific cost and performance parameters. In: Climate Change 2014: Mitigation of Climate Change. Contribution of Working Group III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Edenhofer, O., R. Pichs-Madruga, Y. Sokona, E. Farahani, S. Kadner, K. Seyboth, A. Adler, I. Baum, S. Brunner, P. Eickemeier, B. Kriemann, J. Savolainen, S. Schlömer, C. von Stechow, T. Zwickel and J.C. Minx (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA.
- Jais and jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models. ArXiv preprint, abs/2308.16149.
- Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725, Berlin, Germany. Association for Computational Linguistics.
- Noam Shazeer. 2020. Glu variants improve transformer. ArXiv preprint, abs/2002.05202.
- Large Language Models Can Be Easily Distracted by Irrelevant Context. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 31210–31227. PMLR.
- REPLUG: Retrieval-Augmented Black-Box Language Models.
- Megatron-lm: Training multi-billion parameter language models using model parallelism. ArXiv preprint, abs/1909.08053.
- Retrieval Augmentation Reduces Hallucination in Conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3784–3803, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Towards Expert-Level Medical Question Answering with Large Language Models.
- Towards Answering Climate Questionnaires from Unstructured Climate Reports.
- Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, page 127063.
- Stanford Alpaca: An Instruction-following LLaMA model. GitHub repository.
- Galactica: A Large Language Model for Science.
- No Language Left Behind: Scaling Human-Centered Machine Translation.
- Brian Thompson and Philipp Koehn. 2020. Exploiting Sentence Order in Document Alignment. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5997–6007, Online. Association for Computational Linguistics.
- Efficient Retrieval Augmented Generation from Unstructured Knowledge for Task-Oriented Dialog. In AAAI 2021, Workshop on DSTC9.
- Llama: Open and efficient foundation language models. ArXiv preprint, abs/2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. ArXiv preprint, abs/2307.09288.
- Deep Climate Change: A Dataset and Adaptive domain pre-trained Language Models for Climate Change Related Tasks. In NeurIPS 2022 Workshop on Tackling Climate Change with Machine Learning.
- ChatClimate: Grounding Conversational AI in Climate Science. Swiss Finance Institute Research Paper No. 23-88.
- Towards Fine-grained Classification of Climate Change related Social Media Text. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 434–443, Dublin, Ireland. Association for Computational Linguistics.
- Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998–6008.
- How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs.
- ClimateBert: A Pretrained Language Model for Climate-Related Text.
- Finetuned Language Models are Zero-Shot Learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Emergent abilities of large language models. ArXiv preprint, abs/2206.07682.
- Patrick Wilken and Evgeny Matusov. 2019. Novel applications of factored neural machine translation. ArXiv preprint, abs/1910.03912.
- Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- BloombergGPT: A Large Language Model for Finance. ArXiv preprint: https://arxiv. org/pdf/2303.17564.
- C-Pack: Packaged Resources To Advance General Chinese Embedding.
- On Layer Normalization in the Transformer Architecture. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 10524–10533. PMLR.
- WizardLM: Empowering Large Language Models to Follow Complex Instructions.
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models.
- HellaSwag: Can a Machine Really Finish Your Sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4791–4800, Florence, Italy. Association for Computational Linguistics.
- Sigmoid Loss for Language Image Pre-Training.
- Removing RLHF Protections in GPT-4 via Fine-Tuning.
- Biao Zhang and Rico Sennrich. 2019. Root Mean Square Layer Normalization. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 12360–12371.
- LIMA: Less Is More for Alignment.