Discourse Relation Classification

Updated 16 September 2025

Discourse relation classification is the process of automatically identifying logical or semantic connections between text spans, crucial for coherent discourse in NLP.
The approach uses a multi-stage pipeline with maximum entropy classifiers, integrating features from both syntactic parse trees and lexico-syntactic cues to detect explicit and implicit relations.
Evaluation results show high performance in explicit relation detection while highlighting challenges like error propagation in classifying implicit relations.

Discourse relation classification is the task of automatically identifying the type of logical or semantic relation that holds between two spans of text, typically sentences or clauses. These relations, such as causal, temporal, contrastive, or elaborative, form the structural backbone of coherent discourse and are crucial for applications in discourse parsing, question answering, summarization, and argument mining. In computational linguistics, discourse relation classification encompasses both the detection of the presence of a relation (explicit or implicit) and its precise labeling within a given taxonomy, often following frameworks such as the Penn Discourse Treebank (PDTB) or Rhetorical Structure Theory (RST).

1. Discourse Relation Classification Pipeline and PDTB-Style Parsing

A foundational approach to discourse relation classification is the PDTB-styled end-to-end discourse parser (Lin et al., 2010). This system implements a multi-stage pipeline closely mirroring the PDTB annotation process. The input is raw text, and the output is a structured representation with labeled discourse relations and their argument spans (Arg1, Arg2), including attribution information where relevant.

The principal stages of the pipeline are:

Step 1: Identify Explicit Relations. Candidate discourse connectives are detected using both lexical and syntactic features, then classified for their discourse function. An argument labeler determines if Arg1 occurs in the same sentence (SS) or a previous sentence (PS), followed by argument span extraction. An explicit classifier, leveraging connective string, POS, and contextual cues, assigns predefined explicit relation labels.
Step 2: Identify Non-Explicit Relations. For adjacent sentences without explicit connectives, the system classifies the relation as Implicit, AltLex, EntRel, or NoRel, using features from surrounding context, parse structures, and specific cues (e.g., first words of Arg2).
Step 3: Attribution Span Detection. A clause splitter segments the text; each clause is then classified as attributional or not, completing the discourse structure as per PDTB guidelines.

These stages are implemented as separate statistical classifiers. Maximum entropy classifiers are used throughout (e.g., for connective recognition, argument position, relation classification), relying on both syntactic features (parse paths, compressed syntax paths) and lexico-syntactic features (e.g., tags of connectives and their neighbors).

2. Algorithms, Pseudocode, and Mathematical Modeling

The classification and extraction components are formalized using pseudocode and mathematical notation:

The argument labeler, for a given connective $C$ in text $T$ , first predicts Arg1’s position as SS or PS and selects spans using methods such as tree subtraction or previous sentence selection.
Feature descriptions and classifier decisions are captured using symbols such as $T$ (text), $C$ (connective), and attributes such as "prev" denoting previous sentences.

Maximum entropy models assign probabilities to classes (e.g., SS/PS or relation types) based on the engineered feature vectors, with the classifier learning weight parameters to maximize the conditional log-likelihood of observed labels. LaTeX is used for presenting the sequential logic rather than for deriving novel statistical models.

3. Feature Engineering and Component Design

A distinctive aspect of earlier systems is their dependence on rich, manually engineered features:

Syntactic features: Nodes and paths in the constituent parse tree, compressed paths, and specific relations between connectives and argument head nodes.
Lexico-syntactic features: Surface forms and POS tags of connectives, adjacent word-POS n-grams, and lexicalized production rule patterns.

For non-explicit relation classification, features draw from both constituent/dependency parses and word-pair-based clues—most notably, the first few words of Arg2, which play a substantial role in AltLex relation identification. Error analysis demonstrates that ambiguous connectives like "and" often precipitate false positives in early pipeline components.

4. Evaluation Metrics and Empirical Findings

The parser and its subcomponents are evaluated using standard metrics—Accuracy, Precision, Recall, and F1—under both component-wise and end-to-end (error-cascading) regimes:

Component	Metric	GS+noEP (F1)	GS+EP (F1)	Automation (F1)
Connective Class.	Acc/F1	97.25	--	--
Arg Pos Classifier	F1	97.94	92.09	--
Arg Extractor	F1 (partial)	94.02	91.87	71.00
Explicit Classifier	F1	86.77	--	--
Non-Explicit Class.	F1	20–40	--	--
Attribution Span	F1 (partial)	79.68	--	--
Relation (E2E)	F1 (partial)	46.80	38.18	33–47

"GS" denotes gold standard input, "noEP" is without error propagation, "EP" is with error propagation, and "Automation" is full automation.
The system performs well in components such as connective and explicit relation classification, but the multilayered pipeline is susceptible to cascading errors, especially in argument extraction.
The explicit relation classifier achieves considerably higher F1 than the non-explicit classifier, confirming the established difficulty of implicit discourse relation classification.

Paired t-tests show statistically significant improvements (p < 0.001) when additional contextual features are included in the pipeline. Partial matches (crediting overlap of key headwords) markedly increase measured F1 scores, as exact argument boundary matches are difficult due to the PDTB’s minimality principle.

5. Error Propagation and Limiting Factors

End-to-end evaluation demonstrates that accuracy and recall degrade with error propagation. The principal sources of error are:

Ambiguous Connectives: Words like "and" are frequent false positives.
Argument Boundary Detection: Extraction errors, especially under exact span matching, are nontrivial.
Cascading Component Errors: The sequential pipeline magnifies upstream mistakes, reducing overall structure-level F1.

The use of standard pipeline decomposition helps analyze and localize these issues, but ultimately the system still exhibits end-to-end F1 scores on the order of 33–47% (depending on partial/exact match definitions and error cascading).

6. Significance and Impact on Discourse Parsing

The PDTB-styled parser represents a technical milestone in decomposing discourse relation classification into explicit, modular statistical sub-problems. Its reliance on feature-rich maximum entropy classification, LaTeX-formalized pseudocode for transparent componentization, and meticulous evaluation metrics provide a template for subsequent research.

Major findings include:

Explicit relations are tractable for shallow models with parse-based features, but implicit relations remain challenging, with low F1 even under strong annotation conditions.
Attribution span detection benefits from clause segmentation but is sensitive to error propagation.
Component-wise analysis is crucial for diagnosing failure cases and optimizing multi-stage pipelines.

These methodological insights and error analyses have influenced later neural and end-to-end transformer-based systems, which seek to obviate or replace explicit feature engineering with learned contextual representations. Nevertheless, the PDTB pipeline’s modular decomposition remains influential for both benchmarking and error interpretation in contemporary discourse relation classification research.

PDF Markdown Chat (Pro)

References (1)

A PDTB-Styled End-to-End Discourse Parser (2010)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Discourse Relation Classification.