Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attribute-Consistent Knowledge Graph Representation Learning for Multi-Modal Entity Alignment (2304.01563v1)

Published 4 Apr 2023 in cs.CL

Abstract: The multi-modal entity alignment (MMEA) aims to find all equivalent entity pairs between multi-modal knowledge graphs (MMKGs). Rich attributes and neighboring entities are valuable for the alignment task, but existing works ignore contextual gap problems that the aligned entities have different numbers of attributes on specific modality when learning entity representations. In this paper, we propose a novel attribute-consistent knowledge graph representation learning framework for MMEA (ACK-MMEA) to compensate the contextual gaps through incorporating consistent alignment knowledge. Attribute-consistent KGs (ACKGs) are first constructed via multi-modal attribute uniformization with merge and generate operators so that each entity has one and only one uniform feature in each modality. The ACKGs are then fed into a relation-aware graph neural network with random dropouts, to obtain aggregated relation representations and robust entity representations. In order to evaluate the ACK-MMEA facilitated for entity alignment, we specially design a joint alignment loss for both entity and attribute evaluation. Extensive experiments conducted on two benchmark datasets show that our approach achieves excellent performance compared to its competitors.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Qian Li (236 papers)
  2. Shu Guo (39 papers)
  3. Yangyifei Luo (5 papers)
  4. Cheng Ji (40 papers)
  5. Lihong Wang (38 papers)
  6. Jiawei Sheng (27 papers)
  7. Jianxin Li (128 papers)
Citations (27)

Summary

This paper introduces ACK-MMEA, a novel framework for Multi-Modal Entity Alignment (MMEA) designed to address the "contextual gap" problem inherent in aligning entities across different Multi-Modal Knowledge Graphs (MMKGs). The core challenge identified is that equivalent entities in different MMKGs often have inconsistent attributes – varying numbers of attributes for a given modality (e.g., text, image) or missing attributes entirely for certain modalities. This inconsistency makes it difficult for existing MMEA methods, which typically aggregate all available attributes, to learn accurate entity representations and perform reliable alignment.

To tackle this, ACK-MMEA proposes a two-stage approach:

  1. Multi-Modal Attribute Uniformization: This stage transforms the original MMKGs (KG1,KG2KG_1, KG_2) into Attribute-Consistent KGs (ACKGs). The goal is to ensure every entity in an ACKG has exactly one attribute representation per modality (e.g., one text attribute, one image attribute). This is achieved through two operators:
    • Merge Operator: For entities with multiple attributes of the same modality, an attention mechanism aggregates these attributes into a single, representative feature vector, filtering out noise.
    • Generate Operator: For entities missing an attribute in a specific modality, a new attribute representation is generated by averaging the corresponding attribute representations of its first-order neighbors. This leverages the intuition that neighboring entities often share similar characteristics.
  2. ConsistGNN: This Graph Neural Network (GNN) model learns entity and relation representations from the generated ACKGs.
    • Initialization: Initial node representations (entity, text, image) are obtained using standard methods (TransE, BERT, VGG16) and projected into a common space. Entity representations are initially formed by combining entity, text, and image embeddings.
    • Attribute-Consistent Relation Representation: Relation embeddings are enhanced by incorporating information from the corresponding attribute relations (difference between connected entities' attribute embeddings) for each modality.
    • Relation-aware Entity Representation: Entity embeddings are updated using a GNN layer that aggregates neighbor information. Crucially, it incorporates the enhanced relation representations learned previously. To improve robustness against potential noise introduced by the Generate Operator, random dropouts are applied to neighbors during aggregation.

Finally, the model is trained using a Joint Alignment Loss function that combines three objectives:

  • Aligned Entity Similarity: Encourages aligned entity pairs to have similar embeddings while pushing negative pairs apart (using cosine distance and negative sampling).
  • Aligned Attribute Similarity: Prompts the corresponding uniformized attributes (text-text, image-image) of aligned entities to be similar.
  • Aligned Neighbor Dissimilarity: Inspired by contrastive learning, this loss pushes an entity's embedding closer to its aligned counterpart in the other KG and further away from the embeddings of the counterpart's neighbors.

Experiments on two benchmark datasets (FB15K-DB15K and FB15K-YAGO15K) demonstrate that ACK-MMEA significantly outperforms existing EA and MMEA baselines, particularly in Hits@1 and MRR metrics. Ablation studies confirm the effectiveness of the attribute uniformization process (both Merge and Generate operators), the ConsistGNN architecture with random dropouts, and the components of the joint loss function. The results suggest that explicitly addressing the attribute inconsistency (contextual gap) problem leads to more robust and accurate multi-modal entity alignment.