Papers
Topics
Authors
Recent
Search
2000 character limit reached

Text-Aware Laplacian Propagation

Updated 1 June 2026
  • The paper demonstrates that integrating text-derived structures into graph-based label propagation reduces manual annotation by diffusing topic and seed labels efficiently.
  • The methodology leverages a graph Laplacian framework with nodes defined by documents or words, using topic modeling and word embeddings to drive propagation.
  • Empirical results show competitive macro-F1 scores and improved lexicon expansion performance, enabled by optimized hyperparameters, sparsification, and scalable batching techniques.

Text-aware Laplacian Propagation encompasses a family of label propagation algorithms for semi-supervised or weakly supervised classification in NLP, where the central innovation is the integration of text-derived structure—such as topics, document similarities, or word embeddings—into the graph-based propagation framework. Two representative and influential approaches are (1) Laplacian propagation on topic-enriched document graphs for weakly supervised document classification, and (2) label propagation over word-embedding graphs for emotion lexicon expansion. These methods leverage the graph Laplacian formalism, but they derive the node and edge structure from text features, enabling efficient label diffusion and drastically reducing manual annotation costs (Pawar et al., 2017, Giulianelli, 2017).

1. Graph Construction with Textual Structure

A text-aware Laplacian propagation method begins by defining a graph whose structure encodes semantically meaningful relationships between textual objects, such as documents or words.

For weakly supervised text classification (LPA–TD), the primary steps are:

  • Nodes: Each document corresponds to one node, and additional nodes represent latent topics induced via Latent Dirichlet Allocation (LDA).
  • Document-to-document edges: Similarity is quantified by cosine similarity between TF–IDF vectors. The similarity matrix AddA_{dd} is sparsified so that ≥90% of nodes have at least KK neighbors, where KK is a hyperparameter.
  • Topic-to-document edges: The affinity matrix AtdA_{td} is defined via LDA topic posteriors (θd,t\theta_{d,t}), normalized per document.
  • Balancing influence: Total topic-side influence cdc_d on a document node dd is adjusted so that the fraction of topic-driven edge-weight—governed by parameter τ\tau—controls the balance between topic and document influences.
  • Adjacency matrix: The complete weighted adjacency matrix WW is block-structured to reflect topic–document and document–document connections:

W=(0Atd AtdTAdd).W = \begin{pmatrix} 0 & A_{td} \ A_{td}^T & A_{dd} \end{pmatrix}.

(Pawar et al., 2017)

For lexicon expansion, the nodes represent word types and edges encode distributional similarity, often computed as a sigmoid transformation of cosine similarities between specialized (e.g., emotion-tuned) word embeddings. Edge weights can be parameterized and learned to optimize propagation quality, and large-scale graphs are handled efficiently via batching and sparsification (Giulianelli, 2017).

2. Laplacian-Based Propagation Algorithms

The label propagation step relies on the properties of the graph Laplacian and associated random-walk transition matrices.

  • Degree and Laplacian: The (unnormalized) graph Laplacian KK0 is used, as is the row- or symmetrically-normalized variant.
  • Label propagation update:
    • For topic-enriched document graphs, labels KK1 are iteratively propagated with:

    KK2

    where KK3. The labels on the labeled (topic) nodes are kept fixed (one-hot vectors). - The closed-form solution is KK4.

  • Lexicon expansion (on word graphs): The harmonic-function solution is given by:

KK5

or, with a random-walk transition matrix KK6,

KK7

The propagation objective is to minimize:

KK8

where KK9 hard-clamps the seed labels (Pawar et al., 2017, Giulianelli, 2017).

3. Minimal Supervision and Annotation Strategies

A key advantage of text-aware Laplacian propagation is the exploitation of linguistic structure to minimize human supervision:

  • Topic labeling for document graphs: Instead of labeling many documents, annotators only assign one label per topic node (often KK0 topics for KK1 classes). As each topic is defined by a small set of high-probability words, annotation burden is drastically reduced. The label propagation process then diffuses these labels throughout the document nodes via the graph (Pawar et al., 2017).

  • Seed lexicon for words: For emotion lexicon expansion, only a small set of “seed” words requires initial labeling. Propagation then infers labels for the vast majority of word types (Giulianelli, 2017).

This approach has demonstrated empirical effectiveness. For example, in a 4,000-document HR grievance dataset, labeling only 8 topics achieved macro-F1 scores competitive with full-supervision (2,800 docs labeled) (Pawar et al., 2017).

4. Hyperparameters, Optimization, and Scalability

Parameter selection and optimization play a critical role in text-aware Laplacian propagation.

  • Key hyperparameters:

    • Number of topics KK2: KK3 the number of classes is often sufficient.
    • Topic-influence KK4: Typically in KK5, regulates the balance between semantic prior and empirical similarity.
    • Similarity threshold KK6: Controls graph sparsity, stable results for KK7 in KK8.
    • Propagation parameter KK9: Usually set near AtdA_{td}0 for deep diffusion.
  • Learning edge parameters: In lexicon expansion, edge-weight functions depend on parameters AtdA_{td}1; these are learned via batch gradient descent to minimize the entropy of predicted label distributions, with optimization performed either on the full graph or on random subgraphs for scalability.
  • Sparsification and batching: To handle fully connected graphs efficiently, entries with AtdA_{td}2 are zeroed, and batched optimization is used. In lexicon expansion, batches of AtdA_{td}3 nodes enable consistent parameter learning at AtdA_{td}4 the memory of the full graph (Giulianelli, 2017).

5. Empirical Evaluation and Performance

Text-aware Laplacian propagation exhibits strong empirical performance across multiple tasks.

  • Document classification (LPA–TD): On four binary 20 Newsgroups tasks (PC–MAC, MED–SPACE, POL–SCI, POL–REL), LPA–TD achieved macro-F1 scores superior to feature-labeling and document-labeling baselines:
Task NB-EM GE-FL ClassifyLDA TLC Only-LPA LPA–TD
PC–MAC 0.429 0.666 0.641 0.680 0.486 0.704
MED–SPACE 0.990 0.939 0.926 0.943 0.919 0.951
POL–SCI 0.474 0.618 0.899 0.911 0.601 0.918
POL–REL 0.466 0.765 0.892 0.922 0.559 0.860

Removing incoherent topics improved scores by 1–2 points. In a real-world HR dataset, LPA–TD with only 8 topic-labels achieved macro-F1 of AtdA_{td}5, outperforming several full-supervised classifiers (Pawar et al., 2017).

  • Lexicon expansion and emotion classification: The text-aware propagation method on word graphs achieved improved KL-divergence for lexicon expansion and micro-F1 gains of 1–2 points on emotion classification tasks over SVM and BiLSTM baselines. Batch label propagation matched full-graph performance, and the expanded lexicon consistently improved downstream classification (Giulianelli, 2017).

6. Advantages, Limitations, and Future Directions

Text-aware Laplacian propagation is characterized by several advantages:

  • Dramatic reduction in manual labeling—often only labeling topics or a lexicon seed set.
  • Joint exploitation of global manifold (document–document) and semantic abstraction (topics or embeddings).
  • Robustness across diverse text tasks.

However, limitations remain:

  • Topic coherence is critical; incoherent topics impair propagation quality.
  • The approach is transductive: adding new nodes (documents or words) necessitates re-solving the propagation equations; fully inductive extensions are nontrivial.
  • Sensitivity to parameters AtdA_{td}6, AtdA_{td}7, AtdA_{td}8 requires small-scale grid search; automation would enhance usability.

Future directions include detection and fuzzy-labeling of incoherent topics, and out-of-sample extensions for inductive inference (Pawar et al., 2017, Giulianelli, 2017).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Text-aware Laplacian Propagation.