$\textit{BenchIE}^{FL}$ : A Manually Re-Annotated Fact-Based Open Information Extraction Benchmark (2407.16860v1)
Abstract: Open Information Extraction (OIE) is a field of natural language processing that aims to present textual information in a format that allows it to be organized, analyzed and reflected upon. Numerous OIE systems are developed, claiming ever-increasing performance, marking the need for objective benchmarks. BenchIE is the latest reference we know of. Despite being very well thought out, we noticed a number of issues we believe are limiting. Therefore, we propose $\textit{BenchIE}{FL}$, a new OIE benchmark which fully enforces the principles of BenchIE while containing fewer errors, omissions and shortcomings when candidate facts are matched towards reference ones. $\textit{BenchIE}{FL}$ allows insightful conclusions to be drawn on the actual performance of OIE extractors.
- Automated template generation for question answering over knowledge graphs. In Proceedings of the 26th international conference on world wide web, pages 1191–1200.
- CaRB: A crowdsourced benchmark for open IE. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6262–6267, Hong Kong, China. Association for Computational Linguistics.
- Luciano Del Corro and Rainer Gemulla. 2013. Clausie: Clause-based open information extraction. In Proceedings of the 22nd International Conference on World Wide Web, WWW ’13, page 355–366, New York, NY, USA. Association for Computing Machinery.
- Identifying relations for open information extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1535–1545, Edinburgh, Scotland, UK. Association for Computational Linguistics.
- Open question answering over curated and extracted knowledge bases. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, page 1156–1165, New York, NY, USA. Association for Computing Machinery.
- CompactIE: Compact facts in open information extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 900–910, Seattle, United States. Association for Computational Linguistics.
- AnnIE: An annotation platform for constructing complete open information extraction benchmark. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 44–60, Dublin, Ireland. Association for Computational Linguistics.
- MinIE: Minimizing facts in open information extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2630–2640, Copenhagen, Denmark. Association for Computational Linguistics.
- On aligning OpenIE extractions with knowledge bases: A case study. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, pages 143–154, Online. Association for Computational Linguistics.
- BenchIE: A framework for multi-faceted fact-based open information extraction evaluation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4472–4490, Dublin, Ireland. Association for Computational Linguistics.
- Question-answer driven semantic role labeling: Using natural language to annotate natural language. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 643–653, Lisbon, Portugal. Association for Computational Linguistics.
- OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3748–3761, Online. Association for Computational Linguistics.
- IMoJIE: Iterative memory-based joint open information extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5871–5886, Online. Association for Computational Linguistics.
- WiRe57 : A fine-grained benchmark for open information extraction. In Proceedings of the 13th Linguistic Annotation Workshop, pages 6–15, Florence, Italy. Association for Computational Linguistics.
- Answering complex questions by joining multi-document evidence with quasi knowledge graphs. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’19. ACM.
- KnowledgeNet: A benchmark dataset for knowledge base population. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 749–758, Hong Kong, China. Association for Computational Linguistics.
- When to use what: An in-depth comparative empirical analysis of OpenIE systems for downstream applications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 929–949, Toronto, Canada. Association for Computational Linguistics.
- Multi^2OIE: Multilingual open information extraction based on multi-head attention with BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1107–1117, Online. Association for Computational Linguistics.
- Jacob Solawetz and Stefan Larson. 2021. LSOIE: A large-scale dataset for supervised open information extraction. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2595–2600, Online. Association for Computational Linguistics.
- Gabriel Stanovsky and Ido Dagan. 2016. Creating a large benchmark for open information extraction. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2300–2305, Austin, Texas. Association for Computational Linguistics.
- Open IE as an intermediate structure for semantic tasks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 303–308, Beijing, China. Association for Computational Linguistics.
- Assertion-based qa with question-aware open information extraction. Preprint, arXiv:1801.07414.
- TextRunner: Open information extraction on the web. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), pages 25–26, Rochester, New York, USA. Association for Computational Linguistics.
- Junlang Zhan and Hai Zhao. 2019. Span model for open information extraction on accurate corpus. Preprint, arXiv:1901.10879.