A Survey on Knowledge Graphs: Representation, Acquisition, and Applications
The surveyed paper provides an extensive review of knowledge graph (KG) research, covering critical aspects such as knowledge graph representation learning (KRL), knowledge acquisition and completion, temporal knowledge graphs, and knowledge-aware applications. This survey is essential as it encapsulates various methodologies and milestones in the KG domain, presenting a structured taxonomy for future research enhancements and practical implementations.
Overview of Representation Learning
Representation learning in knowledge graphs, or knowledge graph embedding (KGE), is pivotal for various downstream tasks. The paper categorizes representation learning into four distinct dimensions: representation space, scoring functions, encoding models, and the incorporation of auxiliary information.
- Representation Space:
- Point-Wise Space: Traditional methods like TransE and its variants (TransH, TransR) utilize real-valued vectors to represent entities and relations, capturing relational semantics via vector operations.
- Complex Vector Space: To handle complex symmetry/antisymmetry patterns, embeddings in complex spaces like ComplEx and RotatE use quaternion inner product and other complex operations.
- Manifold and Group: To represent hierarchical data better, methods like ManifoldE and hyperbolic embeddings (MuRP, AttH) leverage non-Euclidean spaces.
- Gaussian Distribution: Models like KG2E and TransG introduce Gaussian distributions to represent uncertainties in entities and relations.
- Scoring Function:
- Distance-Based: Functions like those in TransE measure the distance between transformed entities to score relations.
- Semantic Matching: Bilinear models (e.g., DistMult, HolE) and tensor-based models (e.g., TuckER) perform semantic similarity computation using multiplicative interactions between entity and relation embeddings.
- Encoding Models:
- Linear/Bilinear Models: RESCAL, HolE, and SimplE are classical examples that use linear transformations and bilinear compositional operators.
- Neural Networks: Convolutional (ConvE, HypER), recurrent (RSN), and transformer-based models (CoKE, KG-BERT) provide deeper semantic capturing capabilities.
- Auxiliary Information:
- KG models often enhance embeddings using external information such as textual descriptions (DKRL), type hierarchies (TKRL), and visual content (IKRL).
Knowledge Acquisition and Completion
Knowledge acquisition encompasses constructing, extending, and refining KGs using various methodologies such as embedding learning, relation path reasoning, and rule-based reasoning.
- Embedding-Based Models:
- Embedding methods predict missing links by scoring candidate triples generated from known entities and relations, as exemplified by models like TransE and ProjE.
- Relation Path Reasoning:
- Path inference techniques utilize the multi-step relations within the KG. Methods like PRA and RNN-based path encoding models (e.g., Chain-of-Reasoning) aim to capture complex relational paths.
- RL-Based Path Finding:
- Leveraging reinforcement learning, models like DeepPath and MINERVA formulate pathfinding as a sequential decision-making problem, optimizing the exploration of KG paths.
- Rule-Based Reasoning:
- Logical rules (e.g., KALE, RUGE) integrate symbolic reasoning into embeddings, allowing models to apply deductive logic for more precise reasoning and rule-based inference.
- Meta Relational Learning:
- Tackling the challenge of long-tail phenomenon in KGs, methods like GMatching and Meta-KGR utilize few-shot learning principles to infer facts from minimal relational instances.
Temporal Knowledge Graphs
Temporal KGs incorporate temporal data into the standard triple format, adding an extra dimension to embeddings and reasoning.
- Temporal Information Embedding:
- Methods like TTransE and HyTE extend static embeddings to include timestamps, enabling representation of temporal evolutions.
- Entity Dynamics:
- Dynamic KGs consider the temporal evolution of entities and their relations, with methods like Know-evolve and RE-NET capturing such continuous changes.
- Temporal Relational Dependency:
- Introducing temporal regularization (Jiang et al.) to capture the ordered dependencies and constraints in relational chains.
- Temporal Logical Reasoning:
- Extending logic reasoning models to temporal dimensions, methods like RLvLR-Stream incorporate time-sensitive rules for more accurate temporal reasoning.
Knowledge-Aware Applications
KGs find extensive applications across various fields:
- Language Representation Learning:
- Knowledge-integrated LLMs (e.g., ERNIE, K-BERT) enhance natural language understanding by embedding factual knowledge.
- Question Answering (QA):
- Simple and multi-hop QA models utilize KGs for factual queries (e.g., BAMnet) and complex reasoning (e.g., CogQA).
- Recommender Systems:
- Knowledge-enhanced recommender systems (e.g., MKR, KGAT) improve recommendation accuracy and interpretability by incorporating user-item interactions from KGs.
Future Research Directions
- Complex Reasoning:
- Enhancing complex multi-hop reasoning through advanced relational path encoding and integration of probabilistic logic for handling uncertainties.
- Unified Framework:
- Moving towards a unified theoretical framework that encompasses various KGE and reasoning methodologies while ensuring consistency and extensiveness.
- Interpretability:
- Developing interpretable models that provide explanatory insights into predictions, essential for user trust and model transparency.
- Scalability:
- Addressing computational scalability for large-scale KGs without compromising model expressiveness.
- Knowledge Aggregation:
- Efficient aggregation methodologies, possibly leveraging large-scale pretraining and novel neural architectures.
- Automatic Construction and Dynamics:
- Facilitating the automatic construction of KGs from unstructured data and accommodating dynamic changes in the embedded knowledge.
Conclusion
The survey paper demonstrates the vast research landscape of KGs, covering foundational KGE techniques, advanced knowledge acquisition methodologies, temporal dynamics integration, and diverse application domains. By consolidating different research efforts and proposing future directions, this survey serves as a comprehensive guide for researchers to navigate and contribute to the evolving field of knowledge graphs.