Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Few-Shot Relation Extraction with Hybrid Visual Evidence (2403.00724v1)

Published 1 Mar 2024 in cs.CL and cs.CV

Abstract: The goal of few-shot relation extraction is to predict relations between name entities in a sentence when only a few labeled instances are available for training. Existing few-shot relation extraction methods focus on uni-modal information such as text only. This reduces performance when there are no clear contexts between the name entities described in text. We propose a multi-modal few-shot relation extraction model (MFS-HVE) that leverages both textual and visual semantic information to learn a multi-modal representation jointly. The MFS-HVE includes semantic feature extractors and multi-modal fusion components. The MFS-HVE semantic feature extractors are developed to extract both textual and visual features. The visual features include global image features and local object features within the image. The MFS-HVE multi-modal fusion unit integrates information from various modalities using image-guided attention, object-guided attention, and hybrid feature attention to fully capture the semantic interaction between visual regions of images and relevant texts. Extensive experiments conducted on two public datasets demonstrate that semantic visual information significantly improves the performance of few-shot relation prediction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736.
  2. Matching the blanks: Distributional similarity for relation learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2895–2905, Florence, Italy. Association for Computational Linguistics.
  3. Block: Bilinear superdiagonal fusion for visual question answering and visual relationship detection. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):8102–8109.
  4. Yolov4: Optimal speed and accuracy of object detection.
  5. Prompt-rsvqa: Prompting visual context to a language model for remote sensing visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 1372–1381.
  6. Good visual guidance make a better extractor: Hierarchical visual prefix for multimodal entity and relation extraction. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1607–1618, Seattle, United States. Association for Computational Linguistics.
  7. Good visual guidance makes a better extractor: Hierarchical visual prefix for multimodal entity and relation extraction. arXiv preprint arXiv:2205.03521.
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
  9. Meta-information guided meta-learning for few-shot relation classification. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1594–1605, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  10. MapRE: An effective semantic mapping approach for low-resource relation extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2694–2704, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  12. Function-words adaptively enhanced attention networks for few-shot inverse relation classification. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 2937–2943. International Joint Conferences on Artificial Intelligence Organization. Main Track.
  13. Towards bridged vision and language: Learning cross-modal knowledge representation for relation extraction. IEEE Transactions on Circuits and Systems for Video Technology.
  14. Drake: Deep pair-wise relation alignment for knowledge-enhanced multimodal scene graph generation in social media posts. IEEE Transactions on Circuits and Systems for Video Technology.
  15. Hybrid attention-based prototypical networks for noisy few-shot relation classification. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):6407–6414.
  16. Neural snowball for few-shot relation learning. In AAAI.
  17. Curvature generation in curved spaces for few-shot learning. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 8671–8680.
  18. Mick: A meta-learning framework for few-shot relation classification with small training data. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20, page 415–424, New York, NY, USA. Association for Computing Machinery.
  19. Jiaying Gong and Hoda Eldardiry. 2021. Zero-Shot Relation Classification from Side Information, page 576–585. Association for Computing Machinery, New York, NY, USA.
  20. Jiaying Gong and Hoda Eldardiry. 2023. Prompt-based zero-shot relation extraction with semantic knowledge augmentation.
  21. Circulant-interactive transformer with dimension-aware fusion for multimodal sentiment analysis. In Asian Conference on Machine Learning, pages 391–406. PMLR.
  22. Swapmix: Diagnosing and regularizing the over-reliance on visual context in visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5078–5088.
  23. Exploring task difficulty for few-shot relation extraction. In Proc. of EMNLP.
  24. FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4803–4809, Brussels, Belgium. Association for Computational Linguistics.
  25. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778.
  26. Multimodal relation extraction with cross-modal retrieval and synthesis. arXiv preprint arXiv:2305.16166.
  27. Few-shot relation classification by context attention-based prototypical networks with bert. EURASIP Journal on Wireless Communications and Networking, 2020.
  28. Mewl: Few-shot multimodal word learning with referential uncertainty. arXiv preprint arXiv:2306.00503.
  29. Mmtm: Multimodal transfer module for cnn fusion. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13286–13296.
  30. Bilinear Attention Networks. In Advances in Neural Information Processing Systems 31, pages 1571–1581.
  31. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, volume 2, page 0. Lille.
  32. Dual-gated fusion with prefix-tuning for multi-modal relation extraction. arXiv preprint arXiv:2306.11020.
  33. Wanli Li and Tieyun Qian. 2022. Graph-based model generation for few-shot relation extraction. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 62–71, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  34. Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19325–19337.
  35. Hyperbolic visual embedding learning for zero-shot recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  36. Learn from relation information: Towards prototype representation rectification for few-shot relation extraction. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1822–1831, Seattle, United States. Association for Computational Linguistics.
  37. A simple yet effective relation information guided approach for few-shot relation extraction. In Findings of the Association for Computational Linguistics: ACL 2022, pages 757–763, Dublin, Ireland. Association for Computational Linguistics.
  38. Dual self-attention with co-attention networks for visual question answering. Pattern Recognition, 117:107956.
  39. Visual attention model for name tagging in multimodal social media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1990–1999, Melbourne, Australia. Association for Computational Linguistics.
  40. A simple neural attentive meta-learner. In International Conference on Learning Representations.
  41. Med-flamingo: a multimodal medical few-shot learner. arXiv preprint arXiv:2307.15189.
  42. Tsendsuren Munkhdalai and Hong Yu. 2017. Meta networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 2554–2563. PMLR.
  43. Meta learning to bridge vision and language models for multimodal few-shot learning. arXiv preprint arXiv:2302.14794.
  44. Multimodal prototype-enhanced network for few-shot action recognition. arXiv preprint arXiv:2212.04873.
  45. Deep multimodal fusion for persuasiveness prediction. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, ICMI ’16, page 284–288, New York, NY, USA. Association for Computing Machinery.
  46. Ahmed Osman and Wojciech Samek. 2019. Drau: Dual recurrent attention units for visual question answering. Comput. Vis. Image Underst., 185:24–30.
  47. Glove: Global vectors for word representation. In EMNLP, pages 1532–1543.
  48. Chengwei Qin and Shafiq Joty. 2022. Continual few-shot relation learning via embedding space regularization and data augmentation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2776–2789, Dublin, Ireland. Association for Computational Linguistics.
  49. Few-shot relation extraction via bayesian meta-learning on relation graphs. In International Conference on Machine Learning.
  50. Victor Garcia Satorras and Joan Bruna Estrach. 2018. Few-shot learning with graph neural networks. In International Conference on Learning Representations.
  51. MIMOQA: Multimodal input multimodal output question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5317–5332, Online. Association for Computational Linguistics.
  52. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  53. RIVA: A pre-trained tweet multimodal model based on text-image relation for multimodal NER. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1852–1862, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  54. Multimodal few-shot learning with frozen language models. Advances in Neural Information Processing Systems, 34:200–212.
  55. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  56. Centralnet: a multilayer approach for multimodal fusion. In ECCV Workshops.
  57. Fl-msre: A few-shot learning based approach to multimodal social relation extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13916–13923.
  58. Fl-msre: A few-shot learning based approach to multimodal social relation extraction. Proceedings of the AAAI Conference on Artificial Intelligence, 35(15):13916–13923.
  59. ITA: Image-text alignments for multi-modal named entity recognition. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3176–3189, Seattle, United States. Association for Computational Linguistics.
  60. Cat-mner: Multimodal named entity recognition with knowledge-refined cross-modal attention. In 2022 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE.
  61. Deep multimodal fusion by channel exchanging. Advances in neural information processing systems, 33:4835–4845.
  62. Learning to decouple relations: Few-shot relation classification with entity-guided attention and confusion-aware training. In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020, pages 5799–5809. International Committee on Computational Linguistics.
  63. Active exploration of multimodal complementarity for few-shot action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6492–6502.
  64. Aming Wu and Yahong Han. 2018. Multi-modal circulant fusion for video-to-language and backward. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 1029–1035. International Joint Conferences on Artificial Intelligence Organization.
  65. Difnet: Boosting visual information flow for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18020–18029.
  66. Multimodal Representation with Embedded Visual Guiding Objects for Named Entity Recognition in Social Media Posts, page 1038–1046. Association for Computing Machinery, New York, NY, USA.
  67. Different data, different modalities! reinforced data splitting for effective multimodal information extraction from social media posts. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1855–1864.
  68. Maf: A general matching and alignment framework for multimodal named entity recognition. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, WSDM ’22, page 1215–1223, New York, NY, USA. Association for Computing Machinery.
  69. Mtffn: Multimodal transfer feature fusion network for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters, 19:1–5.
  70. Entity concept-enhanced few-shot relation extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 987–991, Online. Association for Computational Linguistics.
  71. Few-shot multimodal sentiment analysis based on multimodal probabilistic fusion prompts. arXiv preprint arXiv:2211.06607.
  72. Zhi-Xiu Ye and Zhen-Hua Ling. 2019. Multi-level matching and aggregation network for few-shot relation classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2872–2881, Florence, Italy. Association for Computational Linguistics.
  73. Heterogeneous Attention Network for Effective and Efficient Cross-Modal Retrieval, page 1146–1156. Association for Computing Machinery, New York, NY, USA.
  74. Deep modular co-attention networks for visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  75. One-shot learning for fine-grained relation extraction via convolutional siamese neural network. In 2017 IEEE International Conference on Big Data (Big Data), pages 2194–2199.
  76. Joint multimodal entity-relation extraction based on edge-enhanced graph alignment network and word-pair relation tagging. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11051–11059.
  77. Multimodal contrastive training for visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6995–7004.
  78. Peiyuan Zhang and Wei Lu. 2022. Better few-shot relation extraction with label prompt dropout. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6996–7006, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  79. Adaptive co-attention network for named entity recognition in tweets. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).
  80. Learning long- and short-term user literal-preference with multimodal hierarchical transformer network for personalized image caption. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):9571–9578.
  81. Multimodal Relation Extraction with Efficient Graph Alignment, page 5298–5306. Association for Computing Machinery, New York, NY, USA.
  82. Multimodal relation extraction with efficient graph alignment. In Proceedings of the 29th ACM International Conference on Multimedia, pages 5298–5306.
  83. Mnre: A challenge multimodal dataset for neural relation extraction with visual evidence in social media posts. In 2021 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE.
  84. Mnre: A challenge multimodal dataset for neural relation extraction with visual evidence in social media posts. In 2021 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6.
  85. Improving few-shot relation classification by prototypical representation learning with definition text. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 454–464, Seattle, United States. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Jiaying Gong (8 papers)
  2. Hoda Eldardiry (31 papers)

Summary

We haven't generated a summary for this paper yet.