The paper "LLM-Align: Utilizing LLMs for Entity Alignment in Knowledge Graphs" (Chen et al., 6 Dec 2024 ) presents a novel framework, LLM-Align, that leverages the reasoning and instruction-following capabilities of LLMs to improve Entity Alignment (EA) across different Knowledge Graphs (KGs). The core idea is to use LLMs to perform fine-grained matching on a set of candidate entities, addressing the limitations of traditional embedding-based methods which often lack deep semantic understanding.
The paper highlights that while embedding-based methods are effective for learning structural features, they struggle with the rich semantic information contained in entity attributes and relations. Previous attempts to integrate LLMs faced challenges with the large scale of KG data and the reliability issues (like hallucination and positional bias) inherent in LLMs. LLM-Align is designed to mitigate these challenges.
The proposed LLM-Align framework operates in three main stages:
- Candidate Alignment Selection: This initial stage uses an existing embedding-based EA model (like GCN-Align or DERA-R, as used in experiments) to generate entity embeddings and compute similarity scores. For each source entity, a small set of top-k nearest neighbors from the target KG is selected as alignment candidates. This reduces the input size for the subsequent LLM reasoning stage. The similarity scores from the base model are discarded to avoid biasing the LLM.
- Attribute-based Reasoning: This stage utilizes LLMs to reason about potential alignments based on entity attributes. To make the LLM input manageable and informative, heuristic methods are used to select the most important attributes for the source entity and its candidates. The importance (identifiability) of an attribute is calculated based on its function degree (how unique is the value for a given entity) and its frequency among the candidate entities: . The top-k attributes with the highest identifiability are selected, and their triples are included in an "Attribute-aware Prompt" for the LLM. A multi-round voting mechanism is applied here to enhance reliability.
- Relation-based Reasoning: If the attribute-based reasoning does not yield a confident alignment (based on the multi-round voting), this stage is performed. Similar to attribute selection, heuristic rules select the most informative relations based on their function degree and frequency among candidates: . The selected relation triples are included in a "Relation-aware Prompt" for the LLM. The multi-round voting mechanism is also applied in this stage.
The framework formats the EA task as a single-choice selection problem for the LLM. Three types of prompts are defined: Knowledge-driven (entity names only), Attribute-aware (entity names + selected attribute triples), and Relation-aware (entity names + selected relation triples).
A key contribution is the Multi-round Voting Mechanism. This mechanism addresses LLM issues like positional bias (tendency to favor options at the beginning or end of a list) and hallucination. For a given source entity and its candidates, the candidate list is permuted multiple times. The LLM performs reasoning on each permutation, generating independent outputs. A final alignment is chosen if a candidate entity receives a majority vote (selected in at least rounds, where is the number of rounds). If no candidate receives a majority, no alignment is output for that entity by the LLM stage. This ensemble-like approach improves accuracy and stability.
Experiments are conducted on the DBP15K cross-lingual datasets (ZH-EN, JA-EN, FR-EN), using GCN-Align and DERA-R as base models for candidate selection and Qwen1.5-14B-Chat and Qwen1.5-32B-Chat as LLMs for reasoning.
The results demonstrate the effectiveness of LLM-Align:
- It significantly improves the Hits@1 performance compared to the base models alone. For instance, combining GCN-Align with Qwen1.5-32B increased Hits@1 by over 34% on all datasets. With the stronger DERA-R base, LLM-Align with Qwen1.5-32B achieved SOTA Hits@1 scores (98.3% on ZH-EN, 97.6% on JA-EN, 99.5% on FR-EN), improving DERA-R's Hits@1 by 0.4% to 3.2%.
- Ablation studies show that the Attribute-based Reasoning, Relation-based Reasoning, and Multi-round Voting modules all contribute to the performance, with the multi-round voting being particularly effective in mitigating errors. The modules are complementary, achieving the best performance when used together.
- Analysis of candidate alignment orders confirms LLMs' positional bias, with performance being best when the candidate list order from the base model is preserved (which often places the correct answer closer to the top). Random or reversed orders generally degrade performance.
- Analysis of LLM size shows a clear positive correlation between model size (from 1.5B to 32B) and EA performance, especially for more difficult alignment cases where the base model failed to rank the correct entity highly. This suggests larger LLMs have better reasoning capabilities for EA. However, a minimum effective size exists (1.5B was near random chance).
- The number of candidate entities inversely affects performance for pure knowledge-driven prompting, as LLMs struggle to reason over too many options.
In summary, LLM-Align effectively integrates the strengths of traditional embedding-based methods (efficient candidate selection) with the semantic reasoning power of LLMs. By employing heuristic selection of informative triples and a multi-round voting mechanism, it addresses key challenges in applying LLMs to EA, leading to state-of-the-art performance. The framework's modular design and analysis of key components provide practical insights for future LLM-based EA research.