HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science (2310.08511v1)

Published 12 Oct 2023 in cs.CL, cond-mat.mtrl-sci, and cs.AI

Abstract: We propose an instruction-based process for trustworthy data curation in materials science (MatSci-Instruct), which we then apply to finetune a LLaMa-based LLM targeted for materials science (HoneyBee). MatSci-Instruct helps alleviate the scarcity of relevant, high-quality materials science textual data available in the open literature, and HoneyBee is the first billion-parameter LLM specialized to materials science. In MatSci-Instruct we improve the trustworthiness of generated data by prompting multiple commercially available LLMs for generation with an Instructor module (e.g. Chat-GPT) and verification from an independent Verifier module (e.g. Claude). Using MatSci-Instruct, we construct a dataset of multiple tasks and measure the quality of our dataset along multiple dimensions, including accuracy against known facts, relevance to materials science, as well as completeness and reasonableness of the data. Moreover, we iteratively generate more targeted instructions and instruction-data in a finetuning-evaluation-feedback loop leading to progressively better performance for our finetuned HoneyBee models. Our evaluation on the MatSci-NLP benchmark shows HoneyBee's outperformance of existing LLMs on materials science tasks and iterative improvement in successive stages of instruction-data refinement. We study the quality of HoneyBee's LLMing through automatic evaluation and analyze case studies to further understand the model's capabilities and limitations. Our code and relevant datasets are publicly available at \url{https://github.com/BangLab-UdeM-Mila/NLP4MatSci-HoneyBee}.

References (32)

Citations (13)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science (2310.08511v1)

Summary

Related Papers