Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis (2210.12883v1)

Published 23 Oct 2022 in cs.CL and cs.AI

Abstract: Large, diachronic datasets of political discourse are hard to come across, especially for resource-lean languages such as Greek. In this paper, we introduce a curated dataset of the Greek Parliament Proceedings that extends chronologically from 1989 up to 2020. It consists of more than 1 million speeches with extensive metadata, extracted from 5,355 parliamentary record files. We explain how it was constructed and the challenges that we had to overcome. The dataset can be used for both computational linguistics and political analysis-ideally, combining the two. We present such an application, showing (i) how the dataset can be used to study the change of word usage through time, (ii) between significant historical events and political parties, (iii) by evaluating and employing algorithms for detecting semantic shifts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Konstantina Dritsa (3 papers)
  2. Kaiti Thoma (1 paper)
  3. John Pavlopoulos (31 papers)
  4. Panos Louridas (8 papers)
Citations (1)