Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey of Document-Level Information Extraction (2309.13249v1)

Published 23 Sep 2023 in cs.CL

Abstract: Document-level information extraction (IE) is a crucial task in NLP. This paper conducts a systematic review of recent document-level IE literature. In addition, we conduct a thorough error analysis with current state-of-the-art algorithms and identify their limitations as well as the remaining challenges for the task of document-level IE. According to our findings, labeling noises, entity coreference resolution, and lack of reasoning, severely affect the performance of document-level IE. The objective of this survey paper is to provide more insights and help NLP researchers to further enhance document-level IE performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. 1992. Fourth Message Uunderstanding Conference (MUC-4): Proceedings of a Conference Held in McLean, Virginia, June 16-18, 1992.
  2. David Ahn. 2006. The stages of event extraction. In Proceedings of the Workshop on Annotating and Reasoning about Time and Events, pages 1–8.
  3. Hacred: A large-scale relation extraction dataset toward hard cases in practical applications. In FINDINGS.
  4. Nancy Chinchor and Elaine Marsh. 1998. Muc-7 information extraction task definition. In Proceeding of the seventh message understanding conference (MUC-7), Appendices, pages 359–367.
  5. Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs. ArXiv:1909.00228 [cs].
  6. Document-Level Event Extraction via Human-Like Reading Process. ArXiv:2202.03092 [cs].
  7. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.
  8. Xinya Du and Claire Cardie. 2020. Event Extraction by Answering (Almost) Natural Questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 671–683, Online. Association for Computational Linguistics.
  9. Dynamic Global Memory for Document-level Argument Extraction. ArXiv:2209.08679 [cs].
  10. Multi-sentence argument linking. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
  11. Ralph Grishman. 1997. Information extraction: Techniques and challenges. In International summer school on information extraction, pages 10–27. Springer.
  12. Attention Guided Graph Convolutional Networks for Relation Extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 241–251, Florence, Italy. Association for Computational Linguistics.
  13. Kung-Hsiang Huang and Nanyun Peng. 2021. Document-level Event Extraction with Efficient End-to-end Learning of Cross-event Dependencies. In Proceedings of the Third Workshop on Narrative Understanding, pages 36–47, Virtual. Association for Computational Linguistics.
  14. Three Sentences Are All You Need: Local Path Enhanced Document Relation Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 998–1004, Online. Association for Computational Linguistics.
  15. Yusheng Huang and Weijia Jia. 2021. Exploring Sentence Community for Document-Level Event Extraction. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 340–351, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  16. SciREX: A challenge dataset for document-level information extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7506–7516, Online. Association for Computational Linguistics.
  17. Document-level n-ary relation extraction with multiscale representation learning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3693–3704, Minneapolis, Minnesota. Association for Computational Linguistics.
  18. Document-Level N-ary Relation Extraction with Multiscale Representation Learning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3693–3704, Minneapolis, Minnesota. Association for Computational Linguistics.
  19. Text classification algorithms: A survey. Information, 10(4):150.
  20. Graph Enhanced Dual Attention Network for Document-Level Relation Extraction. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1551–1560, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  21. Document-Level Event Argument Extraction by Conditional Generation. ArXiv:2104.05919 [cs].
  22. RAAT: Relation-Augmented Attention Transformer for Relation Modeling in Document-Level Event Extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4985–4997, Seattle, United States. Association for Computational Linguistics.
  23. A joint neural model for information extraction with global features. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7999–8009, Online. Association for Computational Linguistics.
  24. Swin transformer: Hierarchical vision transformer using shifted windows. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002, Los Alamitos, CA, USA. IEEE Computer Society.
  25. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3219–3232, Brussels, Belgium. Association for Computational Linguistics.
  26. Reasoning with Latent Structure Refinement for Document-Level Relation Extraction. ArXiv:2005.06312 [cs].
  27. Cross-Sentence N-ary Relation Extraction with Graph LSTMs. Transactions of the Association for Computational Linguistics, 5:101–115. Place: Cambridge, MA Publisher: MIT Press.
  28. Chris Quirk and Hoifung Poon. 2017. Distant Supervision for Relation Extraction beyond the Sentence Boundary. ArXiv:1609.04873 [cs].
  29. Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network. ArXiv:1906.04684 [cs].
  30. N-ary Relation Extraction using Graph-State LSTM. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2226–2235, Brussels, Belgium. Association for Computational Linguistics.
  31. Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation. ArXiv:2203.10900 [cs].
  32. Revisiting DocRED – Addressing the False Negative Problem in Relation Extraction. ArXiv:2205.12696 [cs].
  33. HIN: Hierarchical Inference Network for Document-Level Relation Extraction. In Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, pages 197–209, Cham. Springer International Publishing.
  34. DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3970–3982, Seattle, United States. Association for Computational Linguistics.
  35. Attention is all you need.
  36. Global-to-Local Neural Networks for Document-Level Relation Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3711–3721, Online. Association for Computational Linguistics.
  37. Query and extract: Refining event extraction as type-oriented binary decoding. In Findings of the Association for Computational Linguistics: ACL 2022, pages 169–182, Dublin, Ireland. Association for Computational Linguistics.
  38. The art of prompting: Event detection based on type specific prompts. In ACL 2023. Association for Computational Linguistics.
  39. Renet: A deep learning approach for extracting gene-disease associations from literature. In RECOMB.
  40. SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2395–2409, Seattle, United States. Association for Computational Linguistics.
  41. Eider: Empowering Document-level Relation Extraction with Efficient Evidence Extraction and Inference-stage Fusion. In Findings of the Association for Computational Linguistics: ACL 2022, pages 257–268, Dublin, Ireland. Association for Computational Linguistics.
  42. Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction. Proceedings of the AAAI Conference on Artificial Intelligence, 35(16):14149–14157. Number: 16.
  43. Cn-dbpedia: A never-ending chinese knowledge extraction system. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems.
  44. Document-level event extraction via heterogeneous graph-based interaction model with a tracker.
  45. A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction. ArXiv:2205.00241 [cs].
  46. Discriminative Reasoning for Document-level Relation Extraction. ArXiv:2106.01562 [cs].
  47. Document-Level Relation Extraction with Reconstruction. Proceedings of the AAAI Conference on Artificial Intelligence, 35(16):14167–14175. Number: 16.
  48. Document-Level Relation Extraction with Path Reasoning. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(4):1–14.
  49. DCFEE: A Document-level Chinese Financial Event Extraction System based on Automatically Labeled Training Data. In Proceedings of ACL 2018, System Demonstrations, pages 50–55, Melbourne, Australia. Association for Computational Linguistics.
  50. Document-level Event Extraction via Parallel Prediction Networks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6298–6308, Online. Association for Computational Linguistics.
  51. DocRED: A Large-Scale Document-Level Relation Extraction Dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 764–777, Florence, Italy. Association for Computational Linguistics.
  52. DocRED: A large-scale document-level relation extraction dataset. In Proceedings of ACL 2019.
  53. EA2E: Improving Consistency with Event Awareness for Document-Level Argument Extraction. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2649–2655, Seattle, United States. Association for Computational Linguistics.
  54. Double Graph Based Reasoning for Document-level Relation Extraction. ArXiv:2009.13752 [cs].
  55. A quality index metric and method for online self-assessment of autonomous vehicles sensory perception. ArXiv, abs/2203.02588.
  56. Attention-based neural network for driving environment complexity perception. 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pages 2781–2787.
  57. Document-level Relation Extraction as Semantic Segmentation. ArXiv:2106.03618 [cs] version: 2.
  58. Document-level Relation Extraction with Dual-tier Heterogeneous Graph. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1630–1641, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  59. Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 337–346, Hong Kong, China. Association for Computational Linguistics.
  60. Doc2EDAG: An end-to-end document-level framework for chinese financial event extraction. In EMNLP.
  61. Exploring various knowledge in relation extraction. In Proceedings of the 43rd annual meeting of the association for computational linguistics (acl’05), pages 427–434.
  62. Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. Proceedings of the AAAI Conference on Artificial Intelligence, 35(16):14612–14620. Number: 16.
  63. Efficient Document-level Event Extraction via Pseudo-Trigger-aware Pruned Complete Graph. ArXiv:2112.06013 [cs].
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Hanwen Zheng (3 papers)
  2. Sijia Wang (24 papers)
  3. Lifu Huang (92 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.