Papers
Topics
Authors
Recent
2000 character limit reached

Greenwashing Claims Dataset

Updated 19 December 2025
  • Greenwashing claims datasets are curated collections of annotated corporate communications used to detect misleading environmental claims.
  • They utilize diverse annotation methods, from expert reviews to LLM-based labeling, ensuring reliable distinction between truthful and misleading claims.
  • The datasets support multimodal and multilingual research, enabling robust benchmarking and development of automated greenwashing detection models.

Greenwashing Claims Dataset

Greenwashing claims datasets are collections of textual, visual, or multimodal instances—spanning reports, ads, or other corporate communications—annotated to indicate whether statements about environmental performance are truthful or misleading (i.e., greenwashing). These datasets provide ground truth or proxy labels for empirical research on the detection, measurement, and analysis of greenwashing across varied modalities, sectors, and linguistic domains. Recent advances have produced the first datasets with explicit greenwashing ground-truth labels, nuanced annotation protocols (e.g., justification ratings, evidence clarity), and both unimodal and multimodal (vision-language) coverage, enabling robust benchmarking and model development for automated greenwashing detection.

1. Definitions and Scope

Greenwashing is operationalized as “the act of misleading consumers regarding the environmental practices of a company or the environmental benefits of a product or service” (Kubinec et al., 18 Nov 2025). In the context of dataset construction, greenwashing claims are distinct from generic “environmental claims” in that they are intentionally vague, exaggerated, unverifiable, or contradicted by available evidence. Some corpora (e.g., EmeraldData (Kaoukis et al., 12 Dec 2025)) offer explicit binary labels (Greenwashing, Not Greenwashing), while others rely on intermediate signals: claim presence, commitment specificity, promise-evidence clarity, or actionability.

Datasets range in focus:

  • Environmental claim detection: Detects whether a statement is an environmental claim, typically as a precursor to greenwashing analysis (Stammbach et al., 2022).
  • Claim veracity/justification: Assigns a truthful/refuting label and provides justification (e.g., EmeraldData, (Kaoukis et al., 12 Dec 2025)).
  • Promise verification: Disentangles claim, supporting evidence, and clarity (e.g., ML-Promise, (Seki et al., 2024)).
  • Multimodal framing and ad analysis: Annotates video/image content for frames that may indicate or obscure greenwashing (e.g., oil & gas advertising datasets (Morio et al., 24 Oct 2025)).
  • Aspect-action pairing: Labels sustainability text for concrete vs. vague commitments, supporting cross-category generalization in ESG reporting (Ong et al., 20 Feb 2025).

No single dataset captures the entire greenwashing phenomenon “end-to-end” (Calamai et al., 11 Feb 2025); rather, the field comprises complementary resources addressing constituent tasks.

2. Methods of Dataset Construction and Annotation

Greenwashing claims datasets employ diverse construction and annotation methodologies, reflecting the inherent difficulty in obtaining reliable gold-standard labels. The leading strategies include:

  • LLM-based labeling and claim generation: Datasets such as EmeraldData (Kaoukis et al., 12 Dec 2025) use a single LLM (LLM; gemma-27b-it) to generate both truthful and greenwashing claims from news coverage, and to label and justify each claim. Confidence-based filtering ensures only claims with high model certainty are retained.
  • Expert annotation and adjudication: Datasets like Environmental Claim Detection (Stammbach et al., 2022) and ML-Promise (Seki et al., 2024) utilize multi-annotator labeling with explicit guidelines, followed by majority voting or adjudication to resolve disagreement. ML-Promise further measures Cohen’s κ for inter-annotator agreement (0.60–0.96).
  • Crowdsourced and human validation: Some large-scale social media analyses (e.g., Meta ad campaigns (Kubinec et al., 18 Nov 2025)) enlist trained coders to verify a stratified sample, measuring reliability with Fleiss’ κ (≈0.72).
  • Automatic or distant labeling: Video benchmarks for greenwashing often inherit “distant” labels from text annotations (e.g., Facebook ad benchmark (Morio et al., 24 Oct 2025)); post-alignment ensures validity. Sub-task datasets (e.g., Green Claims in Tweets (Calamai et al., 11 Feb 2025)) rely on keyword-driven sampling for annotation efficiency.
  • Annotation schema design: Annotation protocols include binary labels (e.g., Greenwashing/Not Greenwashing, Promise/Evidence), multi-level or multi-label taxonomies (e.g., 13 framing types in multimodal benchmarks), and detailed justificatory fields (e.g., EmeraldData’s one- to two-sentence evidence-based rationale).

The balance between annotation cost and label quality leads to heterogeneity in pipeline complexity, class distributions, and annotator expertise.

3. Representative Datasets: Properties and Metrics

Key properties of greenwashing claims datasets—including size, label breakdown, data format, and evaluation metrics—are summarized for principal corpora below.

Dataset Size / Distribution Primary Label(s) Annotation Modality
EmeraldData (Kaoukis et al., 12 Dec 2025) 620 (G: 225, NG: 395) Greenwashing / Not Greenwashing LLM (gemma-27b-it) Text
Environmental Claim Detection (Stammbach et al., 2022) 2,647 (25% claim) Environmental Claim / No-claim Experts (4/16) Text
ML-Promise (Seki et al., 2024) 3,010 (5 langs) Promise, Evidence, Clarity, Timing Multilingual experts Text+Image
Multimodal Framing Benchmark (Morio et al., 24 Oct 2025) 706 videos, 13 label types Framing labels (e.g., Green Innovation) Experts, inherited Video+Text
A3CG (Ong et al., 20 Feb 2025) 2,004 statements Aspect–Action pairs (3-way action) Domain experts Text
Meta Ad Targeting (Kubinec et al., 18 Nov 2025) 1.18M ads (≈3% greenwashing) Bayesian IRT greenwash score, green_label LLM+humans+regex Text

Metrics and evaluation strategies:

  • Standard text metrics: Precision, recall, F1, accuracy are widely reported for claim or aspect extraction tasks (Stammbach et al., 2022, Ong et al., 20 Feb 2025).
  • Selective-prediction metrics: Coverage, accuracy, and overall accuracy for justified claim labeling; EmeraldData also employs justification quality scoring (ILORA: Informativeness, Logicality, Objectivity, Readability, Alignment Accuracy) (Kaoukis et al., 12 Dec 2025).
  • IRT-based latent score: Bayesian item response modeling for ads, yielding a continuous greenwashing score θ̂ and class labels based on posterior thresholds (Kubinec et al., 18 Nov 2025).
  • Cross-category generalization: Micro-F1 on seen versus unseen ESG categories with gap Δ=F1US−F1S for robustness evaluation (Ong et al., 20 Feb 2025).
  • ROUGE-L: For extractive promise/evidence span matching in ML-Promise (Chinese) (Seki et al., 2024).

4. Data Modalities and Downstream Applications

Greenwashing claims datasets span multiple data modalities, supporting research across traditional NLP, VLMs, and hybrid domains:

  • Text-only corpora dominate early datasets, focusing on financial reports, ESG filings, news, and social media posts, enabling fine-grained tasks like claim identification, specificity analysis, and actionability extraction (Stammbach et al., 2022, Ong et al., 20 Feb 2025).
  • Vision-language corpora emerge in recent benchmarks, incorporating video frames, transcripts, and contextual framing to study implicit and multimodal greenwashing cues, especially in PR and advertising (Morio et al., 24 Oct 2025). Video datasets annotate both explicit and impressionistic framing dimensions.
  • Multilingual and multicultural coverage is provided by resources such as ML-Promise, which annotates ESG disclosures in five languages and across regulatory regimes (Seki et al., 2024). Meta Ad Targeting includes campaign assets in 13 languages (Kubinec et al., 18 Nov 2025).
  • Applications: Datasets underpin:

5. Limitations and Open Challenges

The construction, annotation, and use of greenwashing claims datasets present several challenges:

  • Label reliability: Human annotator agreement varies from moderate to high (Krippendorff’s α=0.47–0.82; Cohen’s κ=0.60–0.96). LLM annotators or distant supervision introduce potential biases; no dataset achieves universally accepted ground truth (Kaoukis et al., 12 Dec 2025, Stammbach et al., 2022, Seki et al., 2024, Calamai et al., 11 Feb 2025).
  • Domain and modality coverage: Early datasets are predominantly English and document-centric; recent efforts mitigate this with multilingual annotation (ML-Promise) and cross-modal representation (multimodal framing benchmarks).
  • Granularity: Most labels are binary or categorical; severity and risk gradations (e.g., minor vs. egregious greenwashing) are uncommon but identified as important future directions (Seki et al., 2024).
  • Contextualization: Many claims require document- or campaign-level context. Sentence-level or paragraph-level annotation may fail to capture cross-sentence dependencies or corroborating/contradictory evidence (Calamai et al., 11 Feb 2025).
  • Subjectivity in labels: Vague, implicit, or highly contextual claims and frames are annotated with subjectivity, leading to lower agreement, especially for specificity, clarity, or impressionistic tags (Morio et al., 24 Oct 2025).
  • Absence of direct greenwashing labels in many corpora: The majority of earlier datasets address only the presence of “green” claims, commitment strength, or specificity; only recent corpora produce explicit greenwashing/no-greenwashing ground truth (Calamai et al., 11 Feb 2025).

6. Accessibility and Licensing

Principal greenwashing claims datasets are released under open-access or permissive licenses, with code and detailed documentation enabling reproducibility:

  • EmeraldData (Kaoukis et al., 12 Dec 2025): Release via Athena Research Center (Apache 2.0–style license, forthcoming on GitHub).
  • Environmental Claim Detection (Stammbach et al., 2022): MIT license; available on GitHub and HuggingFace.
  • A3CG (Ong et al., 20 Feb 2025): Publicly available with full protocol and code.
  • Meta Ad Targeting dataset (Kubinec et al., 18 Nov 2025): Data, code, regex lists, model scripts, and IRT outputs are archived openly (DOI, supplement), with raw Meta Ad Library access via public API or authors upon request.
  • ML-Promise (Seki et al., 2024): Released with detailed annotation protocol and code for benchmarking.

A plausible implication is that, given the growing diversity of available greenwashing claims datasets, future research can systematically benchmark detection and justification models, analyze generalization across ESG domains and cultures, and develop practical, explainable tools for regulators and auditors.

7. Relation to Broader Greenwashing Detection and Research Directions

Greenwashing claims datasets form the empirical substrate for a rapidly growing ecosystem of detection methodologies:

  • Retrieval-augmented and knowledge-graph models: EmeraldMind integrates a domain-specific ESG knowledge graph with LLMs for claim verification and achieves justification-centric predictions without fine-tuning (Kaoukis et al., 12 Dec 2025).
  • Multi-signal frameworks: Surveys reveal that no single dataset is sufficient; pipeline architectures combine green-claim detection, actionability, specificity, and evidence matching (Calamai et al., 11 Feb 2025).
  • Multimodal and cross-linguistic generalization: Recent benchmarks emphasize vision-language modeling, multi-language protocols, and robustness to new greenwashing strategies as key frontiers (Seki et al., 2024, Morio et al., 24 Oct 2025, Ong et al., 20 Feb 2025).
  • Policy relevance: Large-scale ad datasets enable empirical study of political and strategic deployment of greenwashing, including targeted campaigns and actor network analysis (Kubinec et al., 18 Nov 2025).

Continued expansion in dataset scale, annotation depth, and multimodal coverage is required to capture the complexity and evolving tactics of greenwashing in corporate, political, and public communication.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Greenwashing Claims Dataset.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube