Argumentation Mining in User-Generated Web Discourse (1601.02403v5)

Published 11 Jan 2016 in cs.CL

Abstract: The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.

Authors (2)

Ivan Habernal (30 papers)
Iryna Gurevych (264 papers)

Citations (258)

View on Semantic Scholar

Summary

The paper proposes a data-driven methodology that classifies and annotates persuasive versus non-persuasive content based on extensive studies of controversial topics.
It adapts Toulmin's argumentation model to capture diverse components like claims, premises, and informal devices such as rhetorical questions.
The machine learning system leverages word embeddings and sentiment analysis to outperform baselines, despite challenges in detecting complex rebuttals and refutations.

Argumentation Mining in User-Generated Web Discourse

The paper by Habernal and Gurevych examines the emerging field of argumentation mining within the context of the diverse and often unruly landscape of user-generated web discourse. The paper seeks to bridge theoretical models of argumentation with practical computational approaches to identifying and analyzing arguments in online content. The authors focus specifically on six controversial topics in education across four different registers—from comments to forums, blogs, and articles—to assess the communicative structure of online argumentation.

The authors propose a data-driven methodology for identifying persuasive content and understanding argument structures within these diverse registers. Initially, they create a sizable corpus composed of nearly 700,000 tokens across more than 5,000 documents to conduct two significant annotation studies. The first annotation paper aims to identify documents that contain persuasive content as they identify that not all documents related to controversial education topics are inherently argumentative. This leads to the classification of materials into persuasive and non-persuasive categories, providing a gold-standard dataset for subsequent analysis.

Central to the paper is the adaptation of Toulmin's original model of argument, which is extended and modified to better fit the idiosyncrasies of web discourse. The authors rightly acknowledge the challenge of applying this model—traditionally used to depict well-structured arguments—to user-generated content that often lacks formal uniformity. The modified Toulmin model maps argument components such as claims, premises, backings, rebuttals, and refutations, with annotations vetted across multiple annotators to achieve a moderate inter-annotator agreement. The model hence serves as a framework for annotating and further computationally recognizing argument components within the discourse.

A second annotation effort focuses on these argument components within the boundaries delineated by the proposed model. The authors achieve a corpus of about 90,000 tokens annotated at the token level, observing that argumentation features such as the implicit presentation of claims and the use of non-traditional argumentation strategies like rhetorical questions and narratives are prevalent in user-generated content.

From a computational perspective, the work explores multiple feature sets and classifiers to automate the process of argumentation mining. Their machine learning-based system outperforms simple baselines, particularly when leveraging rich feature sets that involve word embeddings, sentiment analysis, and structural linguistic data. However, the system shows limitations in effectively identifying complex components like rebuttals and refutations, which suggests potential avenues for further research and model refinement.

The implications of Habernal and Gurevych’s research are significant for both theoretical and practical applications. Theoretically, by providing a refined model grounded in empirical data, the paper enriches argumentation theory's dialogue concerning argument types and structures in less formal settings. Practically, this work opens up direct implications for designing automated systems that can aggregate, summarize, and evaluate public opinion in online forums—a necessity as digital discourse continues to grow and impact decision-making processes in various domains.

Future prospects for research include enhancing detection algorithms for more fluid argument structures, investigating additional emotional and stylistic dimensions of arguments, and improving the cross-register applicability of machine learning models. By continuing to integrate computational techniques with insights from argumentation theory, more robust systems capable of automatically analyzing the argumentative quality of web discourse could be developed.

PDF Markdown

Argumentation Mining in User-Generated Web Discourse (1601.02403v5)

Summary

Argumentation Mining in User-Generated Web Discourse

Related Papers