Continuous LexRank

Updated 6 March 2026

The paper introduces continuous LexRank, replacing binary similarity thresholds with fully weighted cosine similarity graphs to capture nuanced contextual relations.
Continuous LexRank employs a stochastic random walk with a damping factor to compute sentence centrality, analogous to the PageRank algorithm.
Empirical evaluations on DUC datasets show that continuous LexRank achieves consistent ROUGE-1 improvements and robust performance in noisy conditions.

Continuous LexRank is a graph-based method for computing sentence salience in extractive multi-document text summarization. It generalizes the original LexRank algorithm by using fully weighted, real-valued sentence similarity graphs rather than thresholded binary graphs. This approach represents sentences as nodes in a continuous cosine similarity network, enabling fine-grained measurement of semantic proximity for centrality-based ranking. The algorithm employs a stochastic random-walk with damping, analogous to PageRank, to compute the stationary centrality distribution over sentences. Empirical analysis demonstrates its competitive performance in large-scale summarization evaluations and its robustness to noisy or imperfect topical clusters.

1. Continuous LexRank: Definition and Core Principles

Continuous LexRank formulates summarization as the identification of the most "central" sentences in a multi-sentence and multi-document cluster based on inter-sentence lexical similarity. Each sentence $s_k$ is embedded as a high-dimensional tf·idf vector $p_k$ , where the dimension for word $w$ is the product of $w$ 's term frequency in $s_k$ and its inverse document frequency, quantifying both local and global importance. Sentence pairs $(i, j)$ are then scored by cosine similarity:

$w_{ij} = \frac{\sum_{w} \mathrm{tf}_{w,i} \mathrm{idf}_w \, \mathrm{tf}_{w,j} \mathrm{idf}_w}{ \sqrt{\sum_w (\mathrm{tf}_{w,i} \mathrm{idf}_w)^2 } \sqrt{\sum_w (\mathrm{tf}_{w,j} \mathrm{idf}_w)^2} }$

A weighted adjacency matrix $W = [w_{ij}]$ is created, leveraging the full real-valued similarity range rather than binarizing with a threshold as in traditional (thresholded) LexRank. This yields a denser, information-preserving sentence graph.

2. Random Walk Centrality Computation

The algorithm constructs a row-stochastic transition matrix $B$ by normalizing the rows of $W$ so each sums to one. Centrality is then defined as the stationary distribution $p$ of a random walk with damping factor $d \in [0,1]$ . A uniform "teleportation" matrix $U$ with all entries $1/N$ (where $N$ is the number of sentences) is incorporated to ensure ergodicity:

$M = d U + (1-d) B$

The final centrality vector satisfies:

$p = M^T p$

or in expanded form,

$p_j = (1-d) \sum_{i=1}^N B_{ij} p_i + d \frac{1}{N}$

This formulation is mathematically equivalent to PageRank except the graph is undirected due to symmetric cosine similarity, and self-links are typically present. Numerically, $p$ is computed iteratively by the power method until convergence ( $\|p^{(t+1)} - p^{(t)}\| < \epsilon$ ), initializing with the uniform distribution.

3. Thresholded versus Continuous LexRank

Standard LexRank ("thresholded LexRank") constructs an adjacency matrix where $A_{ij} = 1$ if $w_{ij} \ge t$ , and $0$ otherwise; $t$ is typically optimized (e.g., $t=0.1$ yields strongest results). This approach encapsulates only the "strongest" edges, effectively discarding nuanced similarity information below the threshold. By contrast, continuous LexRank retains all computed cosine similarities, resulting in a graph better reflecting the full spectrum of contextual relatedness within the cluster. Empirical results indicate that continuous LexRank marginally but consistently outperforms thresholded variants by avoiding the information loss of binary discretization (Erkan et al., 2011).

4. Empirical Performance and DUC Evaluations

Extensive evaluations were conducted on DUC 2003/2004 and cross-lingual DUC 2004 datasets, using 665-byte generic extractive summaries and ROUGE-1 recall as the main metric. Comparative assessments included random sentence selection, lead-based summaries, centroid methods (tf·idf centroid pseudo-documents), degree centrality, thresholded LexRank, and continuous LexRank. The damping parameter was fixed at $d=0.85$ , and for thresholded methods the optimal threshold was found at $t=0.1$ .

A summary of representative ROUGE-1 results:

Dataset / Method	Centroid	Degree (t=0.1)	LexRank (t=0.1)	Continuous LexRank
DUC2003 Task2	0.362	0.360	0.367	0.365
DUC2004 Task2	0.367	0.371	0.374	0.376
DUC2004 Task4a (MT)	0.383	0.393	0.397	0.396
DUC2004 Task4b (Human)	0.403	0.403	0.405	0.397

Both thresholded and continuous LexRank consistently outperform centroid and lead baselines, with continuous LexRank exhibiting the highest ROUGE-1 scores on most tasks—although the gains over thresholded LexRank are minor. According to the DUC official rankings, these LexRank variants consistently place among the top systems, frequently within the 95% confidence interval of the leading peer (Erkan et al., 2011).

5. Robustness to Topical Noise

Continuous LexRank demonstrates strong robustness to noisy clusters. Experiments where 2 unrelated “off-topic” documents were injected into each 12-document DUC cluster (≈17% noise) revealed that continuous LexRank’s ROUGE-1 dropped by less than 0.01 absolute (e.g., from 0.376 to 0.369 on DUC2004 Task2). In contrast, baseline systems such as lead or random selection showed much larger degradations. This insensitivity to clustering imperfections is attributed to the random-walk nature of prestige flow, which dilutes the influence of isolated or off-topic nodes via global centrality computation (Erkan et al., 2011).

6. Advances: Continuous-Similarity Graphs and Joint Ranking Extensions

Recent graph-based summarization methods such as RepRank (Li et al., 2020) generalize the continuous LexRank approach by leveraging continuous sentence and word embeddings (e.g., GloVe and self-attention representations), and constructing joint sentence/word/keyword similarity graphs. RepRank maintains all edge weights as real-valued cosine similarities and performs a joint random walk over both sentences and keywords in a unified eigenvector problem. Experimental results on DUC-2002 and DUC-2007 reported higher ROUGE-1/2 than standard LexRank, indicating that continuous, embedding-based similarity matrices capture semantic relatedness beyond surface lexical overlap. The absorbing random walk variant further improves redundancy handling with only minor performance tradeoffs. This suggests that the conceptual framework of continuous LexRank is a foundation for further advances in joint and semantic-centrality summarization algorithms (Li et al., 2020).

7. Summary and Significance

Continuous LexRank formalizes sentence importance in summarization as eigenvector centrality within a fully weighted, undirected similarity graph. It replaces binary edge thresholding with fine-grained continuous affinities, computes stationary centrality via a damped power method, and consistently matches or outperforms both traditional centroid baselines and thresholded LexRank methods on ROUGE metrics. Its resilience to noisy cluster composition and its extensions in embedding-driven frameworks affirm its ongoing relevance as a principled, robust approach for extractive summarization grounded in global sentence similarity structure (Erkan et al., 2011, Li et al., 2020).

Markdown Report Issue Upgrade to Chat

References (2)

LexRank: Graph-based Lexical Centrality as Salience in Text Summarization (2011)

Unsupervised Summarization by Jointly Extracting Sentences and Keywords (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Continuous LexRank.

Continuous LexRank

1. Continuous LexRank: Definition and Core Principles

2. Random Walk Centrality Computation

3. Thresholded versus Continuous LexRank

4. Empirical Performance and DUC Evaluations

5. Robustness to Topical Noise

6. Advances: Continuous-Similarity Graphs and Joint Ranking Extensions

7. Summary and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Continuous LexRank

1. Continuous LexRank: Definition and Core Principles

2. Random Walk Centrality Computation

3. Thresholded versus Continuous LexRank

4. Empirical Performance and DUC Evaluations

5. Robustness to Topical Noise

6. Advances: Continuous-Similarity Graphs and Joint Ranking Extensions

7. Summary and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research