Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

[Lions: 1] and [Tigers: 2] and [Bears: 3], Oh My! Literary Coreference Annotation with LLMs (2401.17922v1)

Published 31 Jan 2024 in cs.CL

Abstract: Coreference annotation and resolution is a vital component of computational literary studies. However, it has previously been difficult to build high quality systems for fiction. Coreference requires complicated structured outputs, and literary text involves subtle inferences and highly varied language. New language-model-based seq2seq systems present the opportunity to solve both these problems by learning to directly generate a copy of an input sentence with markdown-like annotations. We create, evaluate, and release several trained models for coreference, as well as a workflow for training new models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. An annotated dataset of coreference in English literature. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 44–54, Marseille, France. European Language Resources Association.
  2. An annotated dataset of literary entities. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2138–2144, Minneapolis, Minnesota. Association for Computational Linguistics.
  3. Sabyasachee Baruah and Shrikanth Narayanan. 2023. Character coreference resolution in movie screenplays. In Findings of the Association for Computational Linguistics: ACL 2023, pages 10300–10313, Toronto, Canada.
  4. Pythia: A suite for analyzing large language models across training and scaling. In Proceedings of the 40th International Conference on Machine Learning, pages 2397–2430, Honolulu, Hawaii, USA.
  5. Coreference resolution through a seq2seq transition-based system. Transactions of the Association for Computational Linguistics, 11:212–226.
  6. Faeze Brahman and Snigdha Chaturvedi. 2020. Modeling protagonist emotions for emotion-aware storytelling. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5277–5294, Online. Association for Computational Linguistics.
  7. Kevin Clark and Christopher D. Manning. 2016. Deep reinforcement learning for mention-ranking coreference models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2256–2262, Austin, Texas. Association for Computational Linguistics.
  8. Coreference annotation and resolution in the colorado richly annotated full text (craft) corpus of biomedical journal articles. BMC Bioinformatics, 18(1):1–14.
  9. Coreference aware representation learning for neural named entity recognition. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 4946–4953.
  10. FantasyCoref: Coreference resolution on fantasy literature through omniscient writer’s point of view. In Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 24–35, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  11. BERT for coreference resolution: Baselines and analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5803–5808, Hong Kong, China. Association for Computational Linguistics.
  12. Rule-based coreference resolution in German historic novels. In Proceedings of the Fourth Workshop on Computational Linguistics for Literature, pages 98–104, Denver, Colorado, USA. Association for Computational Linguistics.
  13. Few-shot anaphora resolution in scientific protocols via mixtures of in-context experts. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2693–2706, Abu Dhabi, United Arab Emirates.
  14. Nghia T. Le and Alan Ritter. 2023. Are large language models robust coreference resolvers?
  15. End-to-end neural coreference resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 188–197, Copenhagen, Denmark. Association for Computational Linguistics.
  16. Better handling coreference resolution in aspect level sentiment classification by fine-tuning language models. In Proceedings of The Sixth Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2023), pages 39–47, Singapore.
  17. F-coref: Fast, accurate and easy to use coreference resolution. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations, pages 48–56, Taipei, Taiwan.
  18. LingMess: Linguistically informed multi expert scorers for coreference resolution. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2752–2760, Dubrovnik, Croatia. Association for Computational Linguistics.
  19. Movie plot analysis via turning point identification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1707–1717, Hong Kong, China. Association for Computational Linguistics.
  20. Corbèn Poot and Andreas van Cranenburgh. 2020. A benchmark of rule-based and neural coreference resolution in Dutch novels and news. In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference, pages 79–90, Barcelona, Spain (online). Association for Computational Linguistics.
  21. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning, 21(1):5485–5551.
  22. Towards coreference for literary text: Analyzing domain-specific phenomena. In Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 129–138, Santa Fe, New Mexico. Association for Computational Linguistics.
  23. Neural end-to-end coreference resolution for German in different domains. In Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021), pages 170–181, Düsseldorf, Germany. KONVENS 2021 Organizers.
  24. The transformation of gender in english-language fiction. Journal of Cultural Analytics, pages 1–25.
  25. CorefQA: Coreference resolution as query-based span prediction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6953–6963, Online. Association for Computational Linguistics.
  26. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
  27. Coreferential Reasoning Learning for Language Representation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7170–7186, Online. Association for Computational Linguistics.
  28. Seq2seq is all you need for coreference resolution. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 11493–11504, Singapore.
Citations (3)

Summary

We haven't generated a summary for this paper yet.