Facing Changes: Continual Entity Alignment for Growing Knowledge Graphs (2207.11436v1)

Published 23 Jul 2022 in cs.CL and cs.DB

Abstract: Entity alignment is a basic and vital technique in knowledge graph (KG) integration. Over the years, research on entity alignment has resided on the assumption that KGs are static, which neglects the nature of growth of real-world KGs. As KGs grow, previous alignment results face the need to be revisited while new entity alignment waits to be discovered. In this paper, we propose and dive into a realistic yet unexplored setting, referred to as continual entity alignment. To avoid retraining an entire model on the whole KGs whenever new entities and triples come, we present a continual alignment method for this task. It reconstructs an entity's representation based on entity adjacency, enabling it to generate embeddings for new entities quickly and inductively using their existing neighbors. It selects and replays partial pre-aligned entity pairs to train only parts of KGs while extracting trustworthy alignment for knowledge augmentation. As growing KGs inevitably contain non-matchable entities, different from previous works, the proposed method employs bidirectional nearest neighbor matching to find new entity alignment and update old alignment. Furthermore, we also construct new datasets by simulating the growth of multilingual DBpedia. Extensive experiments demonstrate that our continual alignment method is more effective than baselines based on retraining or inductive learning.

PDF Abstract

The paper "Facing Changes: Continual Entity Alignment for Growing Knowledge Graphs" (Wang et al., 2022 ) addresses the realistic scenario of aligning entities between knowledge graphs (KGs) that are constantly growing. Existing entity alignment methods primarily focus on static KGs, which is a limitation when dealing with real-world KGs like DBpedia and Wikidata that are frequently updated with new entities and triples. This growth necessitates a method that can continually discover and update entity alignments without costly retraining from scratch.

The authors define the problem of continual entity alignment between growing KGs, where KGs evolve over time with the addition of new triples and entities, but the set of initial seed alignments for training remains constant. This setting poses several challenges:

Efficiently handling new entities: How to learn embeddings for new entities that the pre-trained model has not seen before.
Capturing potential alignment: Identifying alignments for both existing and new entities in the presence of non-matchable entities.
Integrating predictions: Combining new alignment predictions with previously discovered alignments, resolving potential conflicts.

To tackle these challenges, the paper proposes ContEA (Continual Entity Alignment). The method consists of two main modules:

Subgraph-based Entity Alignment (at time t=0): This module initializes the alignment process.
- It uses a Graph Neural Network (GNN) encoder, specifically adapting the Dual-AMN architecture, to represent entities based on their neighborhood structures within each KG.
- An entity reconstruction objective is introduced ($\mathcal{L}_{\text{reconstruct} = \sum_{e} \left\Vert\mathbf{e} - \frac{1}{|\mathcal{N}_{e}|} \sum_{e' \in \mathcal{N}_e} \mathbf{e}' \right\Vert^{2}_{2}$) which encourages an entity's embedding to be close to the mean embedding of its neighbors. This objective is crucial for enabling inductive embedding generation for new entities.
- The standard alignment learning objective ( $\mathcal{L}_{\text{align}}$ ) is used, minimizing the distance between known aligned pairs and maximizing the distance to sampled negative pairs, incorporating techniques like LogSumExp loss and in-batch negative sampling. The overall loss at $t=0$ is $\mathcal{L}_{1} = \mathcal{L}_{\text{align} + \alpha \cdot \mathcal{L}_{\text{reconstruct}$.
- After training, trustworthy alignment search is performed. Instead of a simple nearest neighbor search which assumes every entity has a counterpart, ContEA employs a bidirectional nearest neighbor search. An alignment $(e_1, e_2)$ is considered trustworthy only if $e_2$ is the nearest neighbor of $e_1$ in the target KG, AND $e_1$ is the nearest neighbor of $e_2$ in the source KG. This helps mitigate issues with non-matchable entities.
Embedding and Alignment Update (at time t > 0): This module handles the KG growth.
- When new triples arrive, new entities' embeddings are initialized inductively using the learned GNN encoder and the entity reconstruction objective, based on their seen neighbors (neighbors that already existed in the previous snapshot). This avoids random initialization and leverages the pre-trained model's ability to represent entities from their context.
- The GNN encoder is finetuned. The inner-graph aggregation layer ( $Aggregator_1$ ) is frozen, preserving the basic neighbor aggregation pattern, while the cross-graph matching layer ( $Aggregator_2$ ) is made learnable to adapt to the structural changes in both KGs.
- The finetuning uses a combination of:
  - Affected Seed Alignment (ASA): Replaying seed alignment pairs that involve entities affected by new triples.
  - Selected Trustworthy Alignment (TA): Using a fixed number ( $m$ ) of previously predicted trustworthy alignments with the highest similarity scores as additional "new anchors" for training.
- The finetuning loss is $\mathcal{L}_{\text{2} = \mathcal{L}_{\text{align}(ASA) + \alpha \cdot \mathcal{L}_{\text{reconstruct} + \beta \cdot \mathcal{L}_{\text{align}(TA)$.
- After finetuning, new trustworthy alignment predictions are generated using the updated model and embeddings.
- A trustworthy alignment update strategy is employed to integrate the newly predicted alignments with the accumulated old ones. New alignments involving entirely new entities are kept. For conflicts (an entity aligned to different counterparts over time), the alignment with the higher similarity score is preferred. This ensures that the set of predicted alignments is cumulative and can correct previous mistakes.

To evaluate ContEA, the authors construct three new datasets based on the widely-used DBP15K benchmark (ZH-EN, JA-EN, FR-EN). For each language pair, they simulate KG growth by creating six snapshots ( $t=0$ to $t=5$ ), iteratively adding new triples containing existing or new entities. The training seed alignment set remains constant across snapshots, reflecting the realistic difficulty of obtaining new ground truth for emerging entities. The test set, however, grows to include potential alignments involving new entities.

Experiments compare ContEA against several retraining baselines (MTransE, GCN-Align, AlignE, AliNet, KEGCN, Dual-AMN), which train from scratch on each snapshot, and inductive baselines ( $\rm{MEAN}^{+}$ , $\rm{LAN}^{+}$ , DINGAL-O), which can handle new entities but may not fully adapt the alignment network. Evaluation uses Precision, Recall, and F1 scores against the growing test set at each snapshot.

The results show that ContEA consistently outperforms all baselines on F1 scores across all snapshots and datasets. This is attributed to its ability to leverage prior knowledge (model parameters and previous alignments) and efficiently handle new entities. While performance naturally declines over time for all methods due to increasing KG size and a relatively static seed set, ContEA's decline is less steep. ContEA is also significantly more time-efficient than retraining baselines. Ablation studies confirm the importance of both the selected trustworthy alignment and the affected seed alignment replay for finetuning performance. ContEA also demonstrates superior recall in discovering alignment for new entities specifically.

Further analysis explores incorporating entity names. Using fasttext name embeddings significantly improves ContEA's performance. Combining ContEA's structural approach with a simple Google Translate + Levenshtein distance approach also yields strong results, often outperforming either method alone, suggesting complementarity between structural and lexical/translational information. A case paper illustrates ContEA's ability to correct previously incorrect alignments as KGs grow and more information becomes available.

In conclusion, the paper effectively defines and addresses the practical problem of continual entity alignment on growing KGs. ContEA provides an effective and efficient solution by combining subgraph-based representation learning, an entity reconstruction objective for inductive capabilities, and a finetuning strategy that leverages both previous knowledge and newly discovered high-confidence alignments. The introduced datasets are a valuable contribution for future research in this area. Future work could explore more complex growth scenarios (new relations, deletions), more sophisticated alignment update strategies, and better integration of auxiliary information like text.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Yuxin Wang (132 papers)
Yuanning Cui (7 papers)
Wenqiang Liu (18 papers)
Zequn Sun (32 papers)
Yiqiao Jiang (3 papers)
Kexin Han (4 papers)
Wei Hu (308 papers)

Citations (9)

View on Semantic Scholar

Facing Changes: Continual Entity Alignment for Growing Knowledge Graphs (2207.11436v1)

Related Papers