climate-fever: A Dataset for Verification of Real-World Climate Claims
The paper presents an innovative effort to address the challenge of misinformation concerning climate change through the development of the climate-fever dataset. This work aligns the algorithms for fact-checking real-world claims with the domain-specific requirements of climate science, adapting the Fever framework used for claims verification. The dataset is composed of 1,535 Internet-sourced claims specific to climate change topics, annotated with evidence retrieved from the English Wikipedia. The resulting dataset consists of 7,675 annotated claim-evidence pairs and can be accessed by the research community.
The process of building the climate-fever dataset is detailed, beginning with the collection of real-world claims sourced both from scientifically-informed and skeptical sites. After gathering over 3,000 claims, experts in climate science were engaged to annotate the claims and assess their veracity, resulting in a refined collection based on consensus. For evidence retrieval, a pipeline system was constructed, mirroring the Fever methodology. This involved document-level retrieval using entity-linking, sentence-level retrieval employing dense vector embeddings powered by a Siamese ALBERT model, and subsequent sentence re-ranking.
Apart from supporting the development of evidence-retrieval systems, the dataset provides a baseline for evaluating claim validation algorithms. According to evaluation results, the Fever-trained entailment predictor showed only a 38.78% label-accuracy on the climate-fever dataset. This underscores the significant difference in complexity and subtleties associated with interpreting real-world climate claims compared to artificially constructed ones. For instance, statements in climate-fever might involve nuanced metrics or disputed assertions not typically found in traditional datasets.
The dataset presents several implications for both research communities and practical applications; chiefly, improving the robustness and accuracy of machine learning models in fact-checking tasks, especially within the nuanced context of environmental science. The low inter-annotator agreement level suggests complexities inherent to real-world evidence assessment and highlights the need for specialized models beyond what current Fever-trained systems offer. Future work endeavors to enhance the dataset further, including handling disputed claims and improving evidence evaluation techniques, thereby fostering collaboration between AI researchers and climate scientists.
The authors conclude by asserting the importance of continuing to develop technological solutions to support human fact-checkers, rather than replacing them, especially in addressing misinformation impacting climate policy and public understanding. Insights gained from the climate-fever project reveal opportunities for integrating interdisciplinary expertise to advance the field of automated fact-checking.