Enriched Elementary Discourse Units

Updated 28 November 2025

Enriched Elementary Discourse Units are self-contained text segments that integrate contextual, inferential, and event-centric details for clearer document representation.
They are generated using methods like CCA-based segmentation and LLM-driven extraction, which combine word embeddings with enriched entity and temporal cues.
Empirical evaluations show that enriched EDUs boost tasks such as summarization, semantic similarity, and long-term memory retrieval via graph-based organization.

Enriched Elementary Discourse Units (EDUs) are discrete, linguistically or semantically coherent segments derived from text or conversation, explicitly structured to facilitate downstream processing such as information retrieval, summarization, document understanding, and long-term agent memory. While standard EDUs typically correspond to minimal discourse segments (e.g., clauses) identified by rhetorical structure theory or clause-boundary segmentation, the "enriched" variant extends the concept, incorporating additional contextual, inferential, event-centric, or structural information—most saliently, entity normalization, temporal cues, discourse relations, and application-specific structural signatures. Enriched EDUs serve as atomic information units, bridging the gap between surface text and abstracted, manipulable document or conversation representations for neural and symbolic architectures.

1. Formal Definitions and Typology

Elementary Discourse Units arise in two canonical traditions: linguistic discourse parsing and recent event-centric memory schemas for LLM agents. Standard segmentations (e.g., RST) define each EDU as a minimal contiguous span of tokens, typically clauses or clause-like segments $e_k = \{ w_{s_k},\dots,w_{t_k} \}$ (Koto et al., 2019). In the enriched paradigm, an EDU is additionally a bundle:

Textual content: a self-contained, event-like sentence that aggregates “who did what, when, where, why,” optionally spanning multiple turns or clauses;
Source annotation: index set $\text{src}(e)$ of supporting turns or sentence positions;
Temporal cue: normalized timestamp $T(e)$ , derived or inferred from context;
Entity normalization: form-invariant resolution of referring expressions (e.g., converting “this conference” to “Global AI Innovation Symposium 2024”).

This generalized schema contrasts with the minimal, often clause-level, EDUs in classical discourse parsing and with highly fragmented relation triples, aiming instead to aggregate atomic events in a manner that preserves maximal local context for downstream tasks (Zhou, 21 Nov 2025).

2. Extraction and Enrichment Methodologies

2.1 Canonical Correlation Analysis–based Segmentation

Enriched EDUs can be discovered unsupervised via Canonical Correlation Analysis (CCA) on adjacent-sentence views. Consider $s_i, s_{i+1}$ as two “views” conditionally independent given a latent Gaussian state $L$ , formalized as:

$\begin{aligned} L &\sim \mathcal{N}(0, I_\ell),\ s_i \mid L &\sim \mathcal{N}(W_x L + \mu_x, \Psi_x),\ s_{i+1} \mid L &\sim \mathcal{N}(W_y L + \mu_y, \Psi_y), \end{aligned}$

where $W_x, W_y$ are the CCA canonical directions and $\ell = \min(\dim(s_i), \dim(s_{i+1}))$ (Mehndiratta et al., 18 Jun 2024). Raw GloVe embeddings for each word in $s_i$ and $s_{i+1}$ are concatenated into feature matrices, from which the CCA projection learns shared latent factors.

The projected columns of the resulting matrix ( $E_i = X_i W_x$ ) correspond to latent EDUs—one per canonical direction. Their number per sentence equals the minimum word count of the sentence pair, ensuring balanced latent representation.

2.2 LLM-Driven Event-Based Extraction

Recent work instructs LLMs to decompose conversation history into lists of enriched, self-sufficient event statements. The extraction procedure for each session $s$ is as follows (Zhou, 21 Nov 2025):

Assemble full session context: all turns, speakers, and timestamps;
Prompt the LLM using a few-shot demonstration: “Rewrite as a list of enriched Elementary Discourse Units…” (specifying requirements for self-containment, entity normalization, event focus);
Parse resulting triples $(\text{text}(e), \text{src}(e), T(e))$ into a structured, queryable representation.

Assistant responses are further partitioned into atomic events and structured summary chunks; all are incorporated into a global memory schema.

2.3 RST-Based Neural Encoding and Structuring

In neural document modeling, EDUs are delimited by off-the-shelf RST segmenters. Each span is then embedded via:

Token-level word and dependency-syntax embeddings;
BiLSTM-based contextualization within and across EDU spans;
Augmentation with shallow discourse features (nuclearity, relation scores, binary indicators) (Koto et al., 2019).

These representations can be concatenated with word vectors, injected at various network stages, or incorporated directly into attention calculations for downstream neural models.

3. Structural and Semantic Enrichment Strategies

EDUs are further enriched beyond segmentation and basic embedding by:

Multi-modal/multi-view CCA: Adding part-of-speech tags, dependency relations, or other linguistic features as separate statistical views, then solving multi-view CCA for shared latent factors, enabling richer joint representations (Mehndiratta et al., 18 Jun 2024).
Contextualized embeddings: Substituting GloVe vectors with contextual embeddings (BERT, ELMo) prior to CCA to merge discourse-level and deep syntax–semantic information.
Clustering/Super-EDUs: Grouping similar EDUs via spectral clustering or dimensionality reduction (PCA, autoencoders), yielding higher-level phrase or topic units.
Weighted aggregation: Using canonical correlation scores ( $\rho_k$ ) to weigh each EDU’s impact, especially in similarity calculations. Weighted aggregation of scores can increase robustness to irrelevant segments.
Joint content–structure modeling: Feeding EDU representations into shallow classifiers of discourse relations, allowing for similarity scoring or retrieval to respect structural cues like contrast or causality (Mehndiratta et al., 18 Jun 2024, Koto et al., 2019).
Event-argument parsing: For LLM-extracted EDUs, roles and arguments are identified (e.g., Agent = "Bob", Location = "Tokyo"), forming a basis for relational linking and inferencing in heterogenous memory graphs (Zhou, 21 Nov 2025).

4. Computational Uses in Downstream Tasks

4.1 Semantic Similarity and Textual Matching

EDU-based matching, as in the CCA approach, allows explicit factor-wise comparison between latent components of sentence pairs. For each canonical index, cosine similarity is computed and then aggregated (either uniformly or weighted by $\rho_k$ ) to rescored scales appropriate to semantic similarity benchmarks (e.g., STSB, Mohler) (Mehndiratta et al., 18 Jun 2024).

4.2 Long-Term Conversational Memory and Event Retrieval

An event-centric, enriched EDU schema underpins memory architectures for LLM agents. EDUs are stored as nodes in a heterogenous graph linking sessions, events, and normalized arguments. Retrieval is performed via:

Dense similarity search between question and EDU/argument embeddings;
LLM-based mention detection and filtering for high-recall candidate sets;
Personalized PageRank propagation over the graph, seeded on highly relevant nodes, to aggregate multi-hop and associative evidence for complex QA tasks (Zhou, 21 Nov 2025).

This method supports high-precision, low-token-retrieval for temporally or semantically distributed information.

4.3 Discourse-Aware Abstractive Summarization and Popularity Prediction

Neural models integrate EDU features (both shallow and latent) at encoder, post-encoder, or attention stages to enhance summarization performance. Direct feeding of encoded EDUs into decoder architectures yields improved ROUGE-F1 scores (increased recall) and, for regression tasks (e.g., petition popularity), modest yet systematic improvements in MAE and MAPE over vanilla word-only or sequence-level baselines (Koto et al., 2019).

5. Empirical Evaluations and Benchmarks

Model/Method	Task	Key Metric	Result	Reference
CCA-EDU	STSB	Pearson’s r	0.797	(Mehndiratta et al., 18 Jun 2024)
BiLSTM+Attn+ELMo	STSB	Pearson’s r	0.742	(Mehndiratta et al., 18 Jun 2024)
GEN-SEN	STSB	Pearson’s r	0.793	(Mehndiratta et al., 18 Jun 2024)
CCA-EDU	Mohler	Pearson’s r	0.512	(Mehndiratta et al., 18 Jun 2024)
BiLSTM+CNN	Mohler	Pearson’s r	0.517	(Mehndiratta et al., 18 Jun 2024)
EMem-G (Enriched EDU+Graph)	LongMemEval $_S$	Avg. Acc. %	77.9	(Zhou, 21 Nov 2025)
Full-context (101K tokens)	LongMemEval $_S$	Avg. Acc. %	55.0	(Zhou, 21 Nov 2025)
Nemori	LongMemEval $_S$	Avg. Acc. %	64.2	(Zhou, 21 Nov 2025)
EDU-augmented Summarizer	CNN/Daily Mail	ROUGE-F1	+1.2	(Koto et al., 2019)

In multiple domains, EDU-enriched representations yield gains over deep LSTM or attention-only baselines, particularly under data scarcity or limited context.

6. Graph-Based Organization and Associative Reasoning

The heterogenous memory graph formed from enriched EDUs, session nodes, and argument nodes encodes not just document structure but cross-session and cross-event relations crucial for multi-hop reasoning. Graph construction involves:

Node types: sessions, EDUs, arguments;
Edge types: session–EDU, EDU–argument, argument–synonym (thresholded by cosine similarity of argument embeddings);
Transition matrix for stochastic propagation;
Personalized PageRank (PPR) for multi-hop recall, tuned to query by initializing seed distribution on nodes most similar to the query embedding (Zhou, 21 Nov 2025).

This schema supports compact retrieval budgets while maintaining or exceeding accuracy compared to full-context or chunk-based retrieval.

7. Strengths, Limitations, and Prospective Directions

Advantages:

Principled granularity: Each event or segment is internally coherent for information retrieval or inferencing.
Entity and temporal normalization: Reduces ambiguity, supports cross-session and cross-turn matching, and aids in associative recall.
Non-compressive, lossless design: Particularly in the event-centric paradigm, full detail of events is preserved.
Empirical superiority: Outperforms baseline and full-context memory in long-horizon QA and document-level similarity tasks under various constraints.

Limitations:

Extraction dependence: Quality of entity normalization, event splitting, and context enrichment is bottlenecked by LLM extraction or syntactic parsing accuracy; errors may fragment or poorly parameterize events.
Omission of fine-grained preferences: The atomic, event-centered representation may discard subtle, stylistic, or attitudinal nuances—leading to reduced performance on fine preference-tracking problems (Zhou, 21 Nov 2025).
Synonym and ontology schema: Argument-synonym links rely on simple similarity thresholds, suggesting that richer ontological constraints or advanced synonym detection could further densify the retrieval graph.
Applicability to other domains: While current frameworks primarily target document modeling, semantic similarity, summarization, and conversational memory, a plausible implication is that graph-enriched EDU schemas could serve as bridges for future symbolic-neural hybrid systems in dialogue systems and cross-document summarization.

Enriched EDUs represent a multi-faceted abstraction, derived through unsupervised learning, supervised parsing, or LLM-driven normalization, with widening applicability and potential for further semantic and relational enrichment (Mehndiratta et al., 18 Jun 2024, Koto et al., 2019, Zhou, 21 Nov 2025).