Global Populism Database (GPD)

Updated 13 October 2025

GPD is a comprehensive database that systematically measures populist rhetoric in global political speeches using a 0–2 holistic grading scale.
It employs a detailed annotation protocol with rubric-based chain-of-thought prompting, validated through metrics like Pearson and ICC.
The platform enables scalable, transparent, and comparative analysis of populist framing, facilitating robust political science research.

The Global Populism Database (GPD) is a comprehensive empirical resource designed to systematically measure and compare the ideational content of populist rhetoric in political speeches given by leaders and candidates worldwide. The GPD employs a rigorous annotation protocol—rooted in holistic grading methods—to produce a cross-national, multilingual, and temporally extended panel of populism scores, supporting both foundational research and state-of-the-art computational applications in political science (Tamaki et al., 8 Oct 2025).

1. Data Sources, Structure, and Annotation Protocols

The GPD aggregates political speeches from global leaders across dozens of countries and multiple decades, with coverage including campaign, famous, international, and ceremonial addresses. The annotation protocol is based on holistic grading (HG): each speech is read in full and scored on a 0–2 continuum, where 0 corresponds to negligible populist content and 2 reflects strong and pervasive populism. Coders follow a detailed rubric and anchor speech documentation that includes multiple exemplar speeches from canonical leaders (Tony Blair, George Bush, Barack Obama, Stephen Harper, Sarah Palin, Robert Mugabe, Evo Morales, etc.). The rubric explicitly describes at least six pro-populist and pluralist criteria to guide consistent annotation. For balanced coverage along the scale, anchor speeches illustrating moderate populism are incorporated during training.

The GPD’s data structure encodes both raw texts and metadata, organizing entries by country, leader, speech type, and language. Each annotation is produced by coders trained using a protocol mirroring the chain-of-thought reasoning expected for context-sensitive, ideational content (Tamaki et al., 8 Oct 2025).

2. AI-Driven Measurement: Chain-of-Thought Prompting and Calibration

Recent work demonstrates that rubric and anchor guided chain-of-thought (CoT) prompting enables LLMs to replicate expert human coders in the GPD setting (Tamaki et al., 8 Oct 2025). Specifically, LLMs receive full documentation—including theoretical definitions, grading instructions, rubric schemas, anchor speeches with their scores, and integration instructions. The models then read a test speech, reflect on the rubric and anchor examples, and output both a score (on the 0–2 scale) and a detailed reasoning chain.

Models such as GPT-5 (high reasoning mode) and Qwen3 235B (reasoning-enabled) achieve classification accuracy on par with human coders. Evaluation uses Pearson correlation, Spearman rank, Intraclass Correlation (ICC), Lin’s Concordance Correlation (CCC), Krippendorff’s α, and Bland–Altman analysis. Calibration is assessed via regression of the form:

$\mathrm{Human} \approx a + b \times \mathrm{AI},$

with $a \approx 0$ and $b \approx 1$ denoting optimal alignment (Tamaki et al., 8 Oct 2025).

3. Model Diversity, Implementation, and Error Analysis

A broad set of LLMs has been evaluated on the GPD replication task, ranging from proprietary (GPT-5) to advanced open-weight architectures (GPT-oss 120B, DeepSeek R1/R3, Qwen3 235B, Llama 4 Maverick/Scout). Model selection targets both reasoning depth (pure chain-of-thought versus minimal chain) and architectural scaling (dense and mixture-of-experts). Empirical results indicate that top reasoning-enabled LLMs display high reliability and agreement with human scores, while smaller or reasoning-deficient models suffer from larger scale compression and reduced reliability. A modest tendency for scores to regress toward the mean is observed: extremes are shifted toward the center, but ranking order is preserved (Tamaki et al., 8 Oct 2025).

When evaluated across 12 speeches from the UK, Turkey, and Montenegro, strong agreement with human coding is achieved in high-capacity, reasoning-enabled LLMs. Models with less capacity or prompt adaptation exhibit larger mean absolute error and poor scale calibration.

4. Methodological Advances and Utility for Populism Research

The GPD’s anchor-guided, holistic grading methodology provides an interpretative framework for coders and AI alike to identify latent patterns such as anti-elitism, people-centrism, and Manichaean framing. The explicit chain-of-thought prompting mimics human deliberation and ensures the model reasons through complex, context-sensitive rhetorical phenomena. The resulting automated approach is cost-effective, scalable, and language-agnostic, facilitating large-scale comparative content analysis beyond traditional keyword or dictionary methods.

The method’s transparency and reproducibility are enhanced by requiring LLMs to output detailed reasoning chains, which document the basis for each populism score and enable auditability of AI-driven grading. Adaptability is ensured by swapping rubric and anchor sets, allowing extension to related constructs (e.g., nationalist or crisis framings).

5. Global Coverage, Comparative Analysis, and Extension Potential

The GPD’s multilingual and cross-contextual span supports comparative research into both temporal and spatial variation in populist rhetoric. Its grading protocol is demonstrably robust for diverse speech types and cultural frames: anchor speech training coupled with rubric consistency enables both human and AI coders to operate reliably across shifting political landscapes.

A plausible implication is that the GPD can be extended to measure additional rhetorical constructs or be integrated with computational models such as PopBERT (Erhard et al., 2023), multi-task learning architectures (Huguet-Cabot et al., 2021), and network-based approaches (Garcia-Arteaga et al., 2021). By enabling both dynamic tracking and historical panel analysis, the GPD facilitates the study of populism at scale and with methodological rigor.

6. Limitations and Future Directions

The GPD annotation approach yields high inter-coder reliability and strong AI-human concordance; however, modest scale compression and minor negative bias may require additional calibration when expanding to new contexts or speech genres. The annotation process is sensitive to anchor selection, rubric clarity, and documentation adaptation. Future methodological refinements may include more granular multi-label schemes, explicit sub-frame detection (e.g., host ideologies per (Erhard et al., 2023)), and hybrid quantitative-qualitative approaches.

Extending the GPD protocol to cover related phenomena—such as nationalist discourse, crisis language, or emotion-infused populism—would capitalize on the demonstrated robustness of chain-of-thought LLMs and the anchor-rubric methodology. Scalable, automated coding guided by documented reasoning chains appears poised to underpin the next era of comparative rhetorical analysis in global political research.

7. Summary Table: Key Features of the Global Populism Database (GPD)

Dimension	GPD Protocol	Computational Integration
Annotation method	Holistic Grading (0–2 score)	Chain-of-thought LLM grading
Reference materials	Rubric & anchor speeches	Explicit documentation adaption
Model evaluation	Pearson, ICC, CCC, α	Calibration regression
Comparative scope	Multinational, multilingual	AI-aided cross-context analysis
Reliability	High for reasoning-enabled models	Sensitivity to prompt & anchors
Extensibility	New frames via rubric swap	Cross-domain adaptation

The Global Populism Database establishes an authoritative, empirically grounded platform for the measurement and comparative analysis of populist speech at global scale. It integrates anchor-driven annotation, holistic grading, and advanced AI reasoning to yield reproducible, transparent, and scalable indicators of ideational rhetoric, facilitating robust political science research and real-time content analysis (Tamaki et al., 8 Oct 2025).