Text-Aware Laplacian Propagation
- The paper demonstrates that integrating text-derived structures into graph-based label propagation reduces manual annotation by diffusing topic and seed labels efficiently.
- The methodology leverages a graph Laplacian framework with nodes defined by documents or words, using topic modeling and word embeddings to drive propagation.
- Empirical results show competitive macro-F1 scores and improved lexicon expansion performance, enabled by optimized hyperparameters, sparsification, and scalable batching techniques.
Text-aware Laplacian Propagation encompasses a family of label propagation algorithms for semi-supervised or weakly supervised classification in NLP, where the central innovation is the integration of text-derived structure—such as topics, document similarities, or word embeddings—into the graph-based propagation framework. Two representative and influential approaches are (1) Laplacian propagation on topic-enriched document graphs for weakly supervised document classification, and (2) label propagation over word-embedding graphs for emotion lexicon expansion. These methods leverage the graph Laplacian formalism, but they derive the node and edge structure from text features, enabling efficient label diffusion and drastically reducing manual annotation costs (Pawar et al., 2017, Giulianelli, 2017).
1. Graph Construction with Textual Structure
A text-aware Laplacian propagation method begins by defining a graph whose structure encodes semantically meaningful relationships between textual objects, such as documents or words.
For weakly supervised text classification (LPA–TD), the primary steps are:
- Nodes: Each document corresponds to one node, and additional nodes represent latent topics induced via Latent Dirichlet Allocation (LDA).
- Document-to-document edges: Similarity is quantified by cosine similarity between TF–IDF vectors. The similarity matrix is sparsified so that ≥90% of nodes have at least neighbors, where is a hyperparameter.
- Topic-to-document edges: The affinity matrix is defined via LDA topic posteriors (), normalized per document.
- Balancing influence: Total topic-side influence on a document node is adjusted so that the fraction of topic-driven edge-weight—governed by parameter —controls the balance between topic and document influences.
- Adjacency matrix: The complete weighted adjacency matrix is block-structured to reflect topic–document and document–document connections:
For lexicon expansion, the nodes represent word types and edges encode distributional similarity, often computed as a sigmoid transformation of cosine similarities between specialized (e.g., emotion-tuned) word embeddings. Edge weights can be parameterized and learned to optimize propagation quality, and large-scale graphs are handled efficiently via batching and sparsification (Giulianelli, 2017).
2. Laplacian-Based Propagation Algorithms
The label propagation step relies on the properties of the graph Laplacian and associated random-walk transition matrices.
- Degree and Laplacian: The (unnormalized) graph Laplacian 0 is used, as is the row- or symmetrically-normalized variant.
- Label propagation update:
- For topic-enriched document graphs, labels 1 are iteratively propagated with:
2
where 3. The labels on the labeled (topic) nodes are kept fixed (one-hot vectors). - The closed-form solution is 4.
Lexicon expansion (on word graphs): The harmonic-function solution is given by:
5
or, with a random-walk transition matrix 6,
7
The propagation objective is to minimize:
8
where 9 hard-clamps the seed labels (Pawar et al., 2017, Giulianelli, 2017).
3. Minimal Supervision and Annotation Strategies
A key advantage of text-aware Laplacian propagation is the exploitation of linguistic structure to minimize human supervision:
Topic labeling for document graphs: Instead of labeling many documents, annotators only assign one label per topic node (often 0 topics for 1 classes). As each topic is defined by a small set of high-probability words, annotation burden is drastically reduced. The label propagation process then diffuses these labels throughout the document nodes via the graph (Pawar et al., 2017).
Seed lexicon for words: For emotion lexicon expansion, only a small set of “seed” words requires initial labeling. Propagation then infers labels for the vast majority of word types (Giulianelli, 2017).
This approach has demonstrated empirical effectiveness. For example, in a 4,000-document HR grievance dataset, labeling only 8 topics achieved macro-F1 scores competitive with full-supervision (2,800 docs labeled) (Pawar et al., 2017).
4. Hyperparameters, Optimization, and Scalability
Parameter selection and optimization play a critical role in text-aware Laplacian propagation.
Key hyperparameters:
- Number of topics 2: 3 the number of classes is often sufficient.
- Topic-influence 4: Typically in 5, regulates the balance between semantic prior and empirical similarity.
- Similarity threshold 6: Controls graph sparsity, stable results for 7 in 8.
- Propagation parameter 9: Usually set near 0 for deep diffusion.
- Learning edge parameters: In lexicon expansion, edge-weight functions depend on parameters 1; these are learned via batch gradient descent to minimize the entropy of predicted label distributions, with optimization performed either on the full graph or on random subgraphs for scalability.
- Sparsification and batching: To handle fully connected graphs efficiently, entries with 2 are zeroed, and batched optimization is used. In lexicon expansion, batches of 3 nodes enable consistent parameter learning at 4 the memory of the full graph (Giulianelli, 2017).
5. Empirical Evaluation and Performance
Text-aware Laplacian propagation exhibits strong empirical performance across multiple tasks.
- Document classification (LPA–TD): On four binary 20 Newsgroups tasks (PC–MAC, MED–SPACE, POL–SCI, POL–REL), LPA–TD achieved macro-F1 scores superior to feature-labeling and document-labeling baselines:
| Task | NB-EM | GE-FL | ClassifyLDA | TLC | Only-LPA | LPA–TD |
|---|---|---|---|---|---|---|
| PC–MAC | 0.429 | 0.666 | 0.641 | 0.680 | 0.486 | 0.704 |
| MED–SPACE | 0.990 | 0.939 | 0.926 | 0.943 | 0.919 | 0.951 |
| POL–SCI | 0.474 | 0.618 | 0.899 | 0.911 | 0.601 | 0.918 |
| POL–REL | 0.466 | 0.765 | 0.892 | 0.922 | 0.559 | 0.860 |
Removing incoherent topics improved scores by 1–2 points. In a real-world HR dataset, LPA–TD with only 8 topic-labels achieved macro-F1 of 5, outperforming several full-supervised classifiers (Pawar et al., 2017).
- Lexicon expansion and emotion classification: The text-aware propagation method on word graphs achieved improved KL-divergence for lexicon expansion and micro-F1 gains of 1–2 points on emotion classification tasks over SVM and BiLSTM baselines. Batch label propagation matched full-graph performance, and the expanded lexicon consistently improved downstream classification (Giulianelli, 2017).
6. Advantages, Limitations, and Future Directions
Text-aware Laplacian propagation is characterized by several advantages:
- Dramatic reduction in manual labeling—often only labeling topics or a lexicon seed set.
- Joint exploitation of global manifold (document–document) and semantic abstraction (topics or embeddings).
- Robustness across diverse text tasks.
However, limitations remain:
- Topic coherence is critical; incoherent topics impair propagation quality.
- The approach is transductive: adding new nodes (documents or words) necessitates re-solving the propagation equations; fully inductive extensions are nontrivial.
- Sensitivity to parameters 6, 7, 8 requires small-scale grid search; automation would enhance usability.
Future directions include detection and fuzzy-labeling of incoherent topics, and out-of-sample extensions for inductive inference (Pawar et al., 2017, Giulianelli, 2017).