Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CORE: A Few-Shot Company Relation Classification Dataset for Robust Domain Adaptation (2310.12024v1)

Published 18 Oct 2023 in cs.CL

Abstract: We introduce CORE, a dataset for few-shot relation classification (RC) focused on company relations and business entities. CORE includes 4,708 instances of 12 relation types with corresponding textual evidence extracted from company Wikipedia pages. Company names and business entities pose a challenge for few-shot RC models due to the rich and diverse information associated with them. For example, a company name may represent the legal entity, products, people, or business divisions depending on the context. Therefore, deriving the relation type between entities is highly dependent on textual context. To evaluate the performance of state-of-the-art RC models on the CORE dataset, we conduct experiments in the few-shot domain adaptation setting. Our results reveal substantial performance gaps, confirming that models trained on different domains struggle to adapt to CORE. Interestingly, we find that models trained on CORE showcase improved out-of-domain performance, which highlights the importance of high-quality data for robust domain adaptation. Specifically, the information richness embedded in business entities allows models to focus on contextual nuances, reducing their reliance on superficial clues such as relation-specific verbs. In addition to the dataset, we provide relevant code snippets to facilitate reproducibility and encourage further research in the field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1558–1569. Association for Computational Linguistics.
  2. Matching the Blanks: Distributional Similarity for Relation Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2895–2905, Florence, Italy. Association for Computational Linguistics.
  3. Elisa Bassignana and Barbara Plank. 2022. CrossRE: A cross-domain dataset for relation extraction. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3592–3604, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), page 4171–4186.
  5. Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830, Online. Association for Computational Linguistics.
  6. FewRel 2.0: Towards More Challenging Few-Shot Relation Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6250–6255, Hong Kong, China. Association for Computational Linguistics.
  7. Meta-learning adversarial domain adaptation network for few-shot text classification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, page 1664–1673, Online. Association for Computational Linguistics.
  8. Exploring task difficulty for few-shot relation extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2605–2616, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  9. Ptr: Prompt tuning with rules for text classification. AI Open, 3:182–192.
  10. FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4803–4809, Brussels, Belgium. Association for Computational Linguistics.
  11. Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear.
  12. How’s business going worldwide? a multilingual annotated corpus for business relation extraction. In Proceedings of the 13th Conference on Language Resources and Evaluation, Marseille.
  13. Dbpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6(2):167–195.
  14. Exploring fine-grained entity type constraints for distantly supervised relation extraction. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 2107–2116, Dublin, Ireland. Dublin City University and Association for Computational Linguistics.
  15. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs]. ArXiv: 1907.11692.
  16. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3219–3232, Brussels, Belgium. Association for Computational Linguistics.
  17. Shengfei Lyu and Huanhuan Chen. 2021. Relation classification with entity type restriction. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, page 390–395, Online. Association for Computational Linguistics.
  18. CiteSum: Citation text-guided scientific extreme summarization and domain adaptation with limited supervision. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10922–10935, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  19. Clues: Few-shot learning evaluation in natural language understanding. 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks.
  20. Revisiting Few-shot Relation Classification: Evaluation Data and Classification Schemes. Transactions of the Association for Computational Linguistics, 9:691–706.
  21. Timo Schick and Hinrich Schütze. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, page 255–269, Online. Association for Computational Linguistics.
  22. Prototypical Networks for Few-shot Learning. Advances in Neural Information Processing Systems, page 11.
  23. Hui Wu and Xiaodong Shi. 2022. Adversarial soft prompt tuning for cross-domain sentiment analysis. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2438–2447, Dublin, Ireland. Association for Computational Linguistics.
  24. Knowledge-enhanced domain adaptation in few-shot relation classification. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, page 2183–2191, New York, NY, USA. Association for Computing Machinery.
  25. Position-aware Attention and Supervised Data Improve Slot Filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 35–45, Copenhagen, Denmark. Association for Computational Linguistics.
  26. Improving few-shot relation classification by prototypical representation learning with definition text. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 454–464, Seattle, United States. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Philipp Borchert (7 papers)
  2. Jochen De Weerdt (22 papers)
  3. Kristof Coussement (2 papers)
  4. Arno De Caigny (2 papers)
  5. Marie-Francine Moens (102 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.