Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 78 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 23 tok/s
GPT-5 High 29 tok/s Pro
GPT-4o 93 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 183 tok/s Pro
2000 character limit reached

Revisiting Semi-Supervised Learning with Graph Embeddings (1603.08861v2)

Published 29 Mar 2016 in cs.LG

Abstract: We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models.

Citations (1,943)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces Planetoid, a novel framework that jointly optimizes embeddings for label prediction and neighborhood context in SSL.
  • It combines supervised loss with unsupervised graph embedding objectives, achieving up to 6.1% accuracy improvement and high recall across tasks.
  • The approach supports both transductive and inductive scenarios, enhancing scalability and real-world applicability in various domains.

Revisiting Semi-Supervised Learning with Graph Embeddings

The paper "Revisiting Semi-Supervised Learning with Graph Embeddings" by Zhilin Yang, William W. Cohen, and Ruslan Salakhutdinov addresses the challenge of enhancing semi-supervised learning (SSL) by leveraging graph embeddings. The paper proposes a novel framework named Planetoid (Predicting Labels And Neighbors with Embeddings Transductively Or Inductively from Data), which integrates graph embedding techniques into SSL, addressing both transductive and inductive learning paradigms.

Semi-supervised learning algorithms commonly optimize a composite objective function comprising a supervised loss over labeled data and an unsupervised loss over both labeled and unlabeled data. Traditional graph-based SSL methods often incorporate a graph Laplacian regularization term, maintaining the assumption that nearby nodes in a graph are likely to have similar labels. Graph Laplacian regularization aligns the labels with the graph structure but does not generate useful features for the supervised task.

This research bridges the gap between graph-based SSL and recent advancements in unsupervised representation learning. The novelty of Planetoid lies in its joint training framework, where an embedding for each instance is optimized to predict both the class label and the neighborhood context in the graph. This method significantly differs from traditional graph Laplacian regularization in that it uses the embeddings—trained with distributional information from the graph—to boost task-related performance.

Framework Overview

The Planetoid framework is designed to handle both transductive and inductive learning scenarios:

  1. Transductive Variant:
    • In the transductive approach, the learned embeddings and input feature vectors are jointly utilized to determine class labels for instances observed during training.
    • This method is advantageous when the entire graph, including the test instances, is available during the learning phase.
  2. Inductive Variant:
    • The inductive method extends the model's applicability to unseen instances by defining embeddings as a parametric function of input feature vectors.
    • This allows the model to generalize and make predictions on instances that were not present in the observed graph during training.

Empirical Evaluation

The authors conducted extensive experiments on five diverse datasets, showcasing the efficacy of the Planetoid framework across text classification, distantly-supervised entity extraction, and entity classification tasks. The results are notable for their significant improvement over existing methods. Key findings include:

  • On text classification datasets such as Citeseer, Cora, and Pubmed, the inductive variant of Planetoid (Planetoid-I) outperformed competing methods like manifold regularization (ManiReg) and semi-supervised embedding (SemiEmb) by up to 6.1% in accuracy.
  • For the distantly-supervised entity extraction task using the DIEL dataset, both the transductive and inductive Planetoid variants achieved the highest recall rates, with Planetoid-I performing best with a 50.1% recall@k, demonstrating a robust performance relative to the baseline DIEL.
  • In the NELL entity classification task, Planetoid-I showed a significant improvement in accuracy over SemiEmb at lower labeling rates (up to 18.7% gain).

Implications and Future Directions

The Planetoid framework presents a substantial advancement in incorporating graph structures into semi-supervised learning paradigms, effectively boosting performance by integrating label information into the embedding process. The separation of transductive and inductive approaches allows the framework to be versatile and applicable to different SSL scenarios.

From a practical perspective, the enhanced performance of the Planetoid framework could facilitate more accurate and scalable semi-supervised learning applications across various domains, including text classification and knowledge graph-based tasks. Theoretically, this work underscores the potential of joint training frameworks and embedding-based regularization methods in SSL, creating a precedent for future research in this direction.

Future research could explore the application of Planetoid to more complex neural architectures, such as recurrent networks, and investigate additional datasets where graph structures are derived from distance metrics between feature vectors. Additionally, optimizing and expanding the inductive variant could further enhance its applicability and scalability to large-scale, real-world tasks.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube