Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Aligned Multimodal Learning for NER on Tweet Posts (2305.08372v2)

Published 15 May 2023 in cs.CL and cs.MM

Abstract: Mining structured knowledge from tweets using named entity recognition (NER) can be beneficial for many down stream applications such as recommendation and intention understanding. With tweet posts tending to be multimodal, multimodal named entity recognition (MNER) has attracted more attention. In this paper, we propose a novel approach, which can dynamically align the image and text sequence and achieve the multi-level cross-modal learning to augment textual word representation for MNER improvement. To be specific, our framework can be split into three main stages: the first stage focuses on intra-modality representation learning to derive the implicit global and local knowledge of each modality, the second evaluates the relevance between the text and its accompanying image and integrates different grained visual information based on the relevance, the third enforces semantic refinement via iterative cross-modal interactions and co-attention. We conduct experiments on two open datasets, and the results and detailed analysis demonstrate the advantage of our model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Aiding Intra-Text Representations with Visual Context for Multimodal Named Entity Recognition. In 2019 International Conference on Document Analysis and Recognition, 337–342. IEEE.
  2. Tools and methods for capturing Twitter data during natural disasters. First Monday, 17.
  3. Multimodal Named Entity Recognition with Image Attributes and Image Knowledge. In Database Systems for Advanced Applications, 186–201. Springer International Publishing.
  4. Can images help recognize entities? A study of the role of images for Multimodal NER. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), 87–96. Association for Computational Linguistics.
  5. Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10635–10644.
  6. Good Visual Guidance Make A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction. In Findings of the Association for Computational Linguistics: NAACL 2022, 1607–1618. Association for Computational Linguistics.
  7. Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.
  8. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 4171–4186. Association for Computational Linguistics.
  9. Cyberthreat Detection from Twitter using Deep Neural Networks. In 2019 International Joint Conference on Neural Networks (IJCNN), 1–8.
  10. Durant, K. 2021. Multi-Granularity Contrastive Knowledge Distillation for Multimodal Named Entity Recognition.
  11. A Survey on Vision Transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 87–110.
  12. Mask R-CNN. In 2017 IEEE International Conference on Computer Vision, 2980–2988. IEEE.
  13. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, 770–778. IEEE.
  14. Bidirectional LSTM-CRF Models for Sequence Tagging. ArXiv, abs/1508.01991.
  15. MNER-QG: An End-to-End MRC Framework for Multimodal Named Entity Recognition with Query Grounding. In Proceedings of the AAAI Conference on Artificial Intelligence, 8032–8040. Association for the Advancement of Artificial Intelligence.
  16. Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition. In Proceedings of the 30th ACM International Conference on Multimedia, 3549–3558. Association for Computing Machinery.
  17. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. Int. J. Comput. Vision, 32–73.
  18. Neural Architectures for Named Entity Recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 260–270. Association for Computational Linguistics.
  19. A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering, 50–70.
  20. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014, 740–755. Springer International Publishing.
  21. Multi-Granularity Cross-Modality Representation Learning for Named Entity Recognition on Social Media. ArXiv, abs/2210.14163.
  22. Visual Attention Model for Name Tagging in Multimodal Social Media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 1990–1999. Association for Computational Linguistics.
  23. Flat Multi-modal Interaction Transformer for Named Entity Recognition. In Proceedings of the 29th International Conference on Computational Linguistics, 2055–2064. International Committee on Computational Linguistics.
  24. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1064–1074. Association for Computational Linguistics.
  25. Multimodal Named Entity Recognition for Short Social Media Posts. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 852–860. Association for Computational Linguistics.
  26. Cyberattack Prediction Through Public Text Analysis and Mini-Theories. 2018 IEEE International Conference on Big Data (Big Data), 3001–3010.
  27. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, 8748–8763. PMLR.
  28. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1137–1149.
  29. Weakly Supervised Extraction of Computer Security Events from Twitter. In Proceedings of the 24th International Conference on World Wide Web, 896–905. International World Wide Web Conferences Steering Committee.
  30. RIVA: A Pre-trained Tweet Multimodal Model Based on Text-image Relation for Multimodal NER. In Proceedings of the 28th International Conference on Computational Linguistics, 1852–1862. International Committee on Computational Linguistics.
  31. RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 13860–13868. AAAI Press.
  32. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010. Curran Associates, Inc.
  33. Categorizing and Inferring the Relationship between the Text and Image of Twitter Posts. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2830–2840. Association for Computational Linguistics.
  34. Multimodal Attention with Image Text Spatial Relationship for OCR-Based Image Captioning. In Proceedings of the 28th ACM International Conference on Multimedia, 4337–4345. Association for Computing Machinery.
  35. ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3176–3189. Association for Computational Linguistics.
  36. PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition. In Database Systems for Advanced Applications: 27th International Conference, DASFAA 2022, 297–305. Springer-Verlag.
  37. CAT-MNER: Multimodal Named Entity Recognition with Knowledge-Refined Cross-Modal Attention. In 2022 IEEE International Conference on Multimedia and Expo (ICME), 1–6.
  38. Multimodal Representation with Embedded Visual Guiding Objects for Named Entity Recognition in Social Media Posts. In Proceedings of the 28th ACM International Conference on Multimedia, 1038–1046. Association for Computing Machinery.
  39. Different Data, Different Modalities! Reinforced Data Splitting for Effective Multimodal Information Extraction from Social Media Posts. In Proceedings of the 29th International Conference on Computational Linguistics, 1855–1864. International Committee on Computational Linguistics.
  40. MAF: A General Matching and Alignment Framework for Multimodal Named Entity Recognition. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 1215–1223. Association for Computing Machinery.
  41. Relationship-Embedded Representation Learning for Grounding Referring Expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2765–2779.
  42. Exploring Visual Relationship for Image Captioning. In Computer Vision – ECCV 2018, 711–727. Springer International Publishing.
  43. Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 3342–3352. Association for Computational Linguistics.
  44. Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance. In Proceedings of the AAAI Conference on Artificial Intelligence, 14347–14355.
  45. Adaptive co-attention network for named entity recognition in tweets. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 5674–5681. AAAI Press.
  46. Learning from Different Text-Image Pairs: A Relation-Enhanced Graph Convolutional Network for Multimodal NER. 3983–3992. Association for Computing Machinery.
  47. Entity-level Interaction via Heterogeneous Graph for Multimodal Named Entity Recognition. In Findings of the Association for Computational Linguistics: EMNLP 2022, 6345–6350. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Peipei Liu (14 papers)
  2. Hong Li (216 papers)
  3. Yimo Ren (7 papers)
  4. Jie Liu (492 papers)
  5. Shuaizong Si (1 paper)
  6. Hongsong Zhu (19 papers)
  7. Limin Sun (32 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.