Corporate Greenwashing Detection in Text -- a Survey (2502.07541v1)

Published 11 Feb 2025 in cs.CL

Abstract: Greenwashing is an effort to mislead the public about the environmental impact of an entity, such as a state or company. We provide a comprehensive survey of the scientific literature addressing natural language processing methods to identify potentially misleading climate-related corporate communications, indicative of greenwashing. We break the detection of greenwashing into intermediate tasks, and review the state-of-the-art approaches for each of them. We discuss datasets, methods, and results, as well as limitations and open challenges. We also provide an overview of how far the field has come as a whole, and point out future research directions.

Summary

The paper surveys natural language processing (NLP) techniques applied to detecting corporate greenwashing in text, reviewing 61 works.
A major challenge identified is the scarcity of annotated datasets specifically tailored for training models to detect greenwashing.
The survey suggests future research should focus on developing comprehensive labeled datasets and integrating NLP with domain-specific knowledge.

Overview of "Corporate Greenwashing Detection in Text - a Survey"

The paper "Corporate Greenwashing Detection in Text - a Survey," authored by Tom Calamai, Oana Balalau, Théo Le Guenedal, and Fabian M. Suchanek, provides a thorough literature review on the application of NLP in detecting corporate greenwashing within text-based communications. This survey systematically examines a broad array of approaches, datasets, and methodologies pertinent to identifying misleading environmental claims by corporations.

Key Insights and Structure

Conceptual Framework of Greenwashing: The survey begins by clarifying the concept of greenwashing, defining it as the practice whereby corporations present misleading information regarding their environmental impact. This serves as an essential foundation for examining the detection techniques.
Survey Methodology: The authors divide the detection of greenwashing into a series of intermediate NLP tasks. These tasks are dissected and reviewed for their state-of-the-art methodologies, datasets, and results. The survey identifies 61 scholarly works contributing to this field.
Annotated Datasets: A critical aspect addressed is the availability and scarcity of annotated datasets tailored for detecting greenwashing. The authors discover that while indirect indicators of greenwashing are studied, fully dedicated datasets remain undeveloped. This poses a barrier for training machine learning models specifically to detect greenwashing.
NLP Task Classification: The survey categorizes the NLP tasks into detecting climate-related text, thematic analysis, claim detection, stance detection, and deception techniques, among others. Each category is explored with attention to datasets used, models applied, and the outcomes achieved. The segmentation helps researchers pinpoint specific areas requiring further exploration or improvement.
Performance and Models: The paper scrutinizes various models—such as BERT, RoBERTa, and fine-tuned domain-specific models like ClimateBERT—applicable to these intermediate tasks. It surfaces empirical results, underlining areas where models excel and where they fall short, particularly regarding robustness and real-world application.
Challenges and Opportunities: The survey elucidates the challenges facing greenwashing detection, chiefly the ambiguity in definitions, the lack of annotated data, and the difficulty in empirically validating methodologies. It proposes future research directions, emphasizing the synthesis between regulatory frameworks and technological advancements in NLP.
Future Directions: The authors call attention to the need for comprehensive, labeled datasets that may advance the practical application of NLP in greenwashing detection at a broader scale. They suggest integrating NLP tools with domain-specific insights could enhance the reliability of detecting greenwashing.

Implications and Broader Impact

The implications of this paper extend beyond academic research into the practical field, where reliable detection of greenwashing would aid regulatory bodies, investors, and consumers in holding corporations accountable for their environmental impact claims. As the field continues to mature, the paper highlights the potential societal benefits of employing advanced NLP technologies to mitigate deceptive corporate practices.

Concluding Remarks

The paper adeptly marks the current landscape of greenwashing detection in textual data using NLP. While pioneering in its exhaustive review of methodologies and datasets, it simultaneously urges the academic community towards overcoming existing barriers. This comprehensive study is a valuable resource for future research in developing efficient, reliable tools for greenwashing detection.

Overall, this survey refrains from sensationalism, adopting a methodical approach to summarizing the current academic discourse, while pragmatically suggesting areas requiring focused research and development.