Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking (2312.11816v2)

Published 19 Dec 2023 in cs.AI and cs.CV

Abstract: Multimodal Entity Linking (MEL) aims at linking ambiguous mentions with multimodal information to entity in Knowledge Graph (KG) such as Wikipedia, which plays a key role in many applications. However, existing methods suffer from shortcomings, including modality impurity such as noise in raw image and ambiguous textual entity representation, which puts obstacles to MEL. We formulate multimodal entity linking as a neural text matching problem where each multimodal information (text and image) is treated as a query, and the model learns the mapping from each query to the relevant entity from candidate entities. This paper introduces a dual-way enhanced (DWE) framework for MEL: (1) our model refines queries with multimodal data and addresses semantic gaps using cross-modal enhancers between text and image information. Besides, DWE innovatively leverages fine-grained image attributes, including facial characteristic and scene feature, to enhance and refine visual features. (2)By using Wikipedia descriptions, DWE enriches entity semantics and obtains more comprehensive textual representation, which reduces between textual representation and the entities in KG. Extensive experiments on three public benchmarks demonstrate that our method achieves state-of-the-art (SOTA) performance, indicating the superiority of our model. The code is released on https://github.com/season1blue/DWE

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Building a Multimodal Entity Linking Dataset From Tweets. In Proceedings of the Twelfth Language Resources and Evaluation Conference, 4285–4292. Marseille, France: European Language Resources Association. ISBN 979-10-95546-34-4.
  2. Multimodal entity linking for tweets. In European Conference on Information Retrieval, 463–478. Springer.
  3. Named entity extraction for knowledge graphs: A literature overview. IEEE Access, 8: 32862–32881.
  4. Dbpedia: A nucleus for a web of open data. In The Semantic Web: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007+ ASWC 2007, Busan, Korea, November 11-15, 2007. Proceedings, 722–735. Springer.
  5. ANPs extractor: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 21st ACM international conference on Multimedia.
  6. Few-shot named entity recognition with self-describing networks. arXiv preprint arXiv:2203.12252.
  7. Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion. arXiv preprint arXiv:2205.02357.
  8. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North.
  9. Named entity disambiguation for noisy text. arXiv preprint arXiv:1706.09147.
  10. Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination. arXiv preprint arXiv:2305.12256.
  11. Multimodal entity linking: a new dataset and a baseline. In Proceedings of the 29th ACM International Conference on Multimedia, 993–1001.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  13. Siamese Neural Networks for One-shot Image Recognition.
  14. Design challenges for entity linking. Transactions of the Association for Computational Linguistics, 3: 315–328.
  15. Decoupled Weight Decay Regularization. arXiv e-prints, arXiv:1711.05101.
  16. Hierarchical question-image co-attention for visual question answering. Advances in neural information processing systems, 29.
  17. Hierarchical Question-Image Co-Attention for Visual Question Answering.
  18. Using Multimodal Contrastive Knowledge Distillation for Video-Text Retrieval. IEEE Transactions on Circuits and Systems for Video Technology.
  19. Joint-attention feature fusion network and dual-adaptive NMS for object detection. Knowledge-Based Systems, 241: 108213.
  20. Covid-on-the-Web: Knowledge graph and services to advance COVID-19 research. In The Semantic Web–ISWC 2020: 19th International Semantic Web Conference, Athens, Greece, November 2–6, 2020, Proceedings, Part II 19, 294–310. Springer.
  21. Multimodal named entity disambiguation for noisy social media posts. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2000–2008.
  22. Olafenwa, A. 2021. Simplifying object segmentation with pixellib library. Online.(2021). https://vixra. org/abs/2101.0122.
  23. Learning Transferable Visual Models From Natural Language Supervision. ArXiv:2103.00020 [cs].
  24. Mean-shifted contrastive loss for anomaly detection. arXiv preprint arXiv:2106.03844.
  25. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering, 27(2): 443–460.
  26. Generative Multimodal Entity Linking. arXiv preprint arXiv:2306.12725.
  27. Visual Named Entity Linking: A New Dataset and A Baseline. arXiv preprint arXiv:2211.04872.
  28. Categorizing and inferring the relationship between the text and image of twitter posts. In Proceedings of the 57th annual meeting of the Association for Computational Linguistics, 2830–2840.
  29. Vrandečić, D. 2012. Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st international conference on world wide web, 1063–1064.
  30. Survey on sentiment analysis using twitter dataset. In 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 208–211. IEEE.
  31. Richpedia: a large-scale, comprehensive multi-modal knowledge graph. Big Data Research, 22: 100159.
  32. Multimodal Entity Linking with Gated Hierarchical Fusion and Contrastive Training. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 938–948. Madrid Spain: ACM. ISBN 978-1-4503-8732-3.
  33. ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition. arXiv preprint arXiv:2112.06482.
  34. WikiDiverse: A Multimodal Entity Linking Dataset with Diversified Contextual Topics and Entity Types. ArXiv:2204.06347 [cs].
  35. Scalable Zero-shot Entity Linking with Dense Entity Retrieval.
  36. MMEL: A Joint Learning Framework for Multi-Mention Entity Linking. In Uncertainty in Artificial Intelligence, 2411–2421. PMLR.
  37. Attention-based multimodal entity linking with high-quality images. In International Conference on Database Systems for Advanced Applications, 533–548. Springer.
  38. Neural machine translation with universal visual representation. In International Conference on Learning Representations.
  39. Dynamic modeling cross-modal interactions in two-phase prediction for entity-relation extraction. IEEE Transactions on Neural Networks and Learning Systems.
  40. Enhancing Chinese character representation with lattice-aligned attention. IEEE Transactions on Neural Networks and Learning Systems.
  41. MCL: multi-granularity contrastive learning framework for Chinese NER. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 14011–14019.
  42. Weibo-MEL, Wikidata-MEL and Richpedia-MEL: Multimodal Entity Linking Benchmark Datasets. In Qin, B.; Jin, Z.; Wang, H.; Pan, J.; Liu, Y.; and An, B., eds., Knowledge Graph and Semantic Computing: Knowledge Graph Empowers New Infrastructure Construction, volume 1466, 315–320. Singapore: Springer Singapore. ISBN 9789811664700 9789811664717. Series Title: Communications in Computer and Information Science.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Shezheng Song (12 papers)
  2. Shan Zhao (32 papers)
  3. Chengyu Wang (93 papers)
  4. Tianwei Yan (6 papers)
  5. Shasha Li (57 papers)
  6. Xiaoguang Mao (27 papers)
  7. Meng Wang (1063 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets