Seq2seq is All You Need for Coreference Resolution (2310.13774v1)
Abstract: Existing works on coreference resolution suggest that task-specific models are necessary to achieve state-of-the-art performance. In this work, we present compelling evidence that such models are not necessary. We finetune a pretrained seq2seq transformer to map an input document to a tagged sequence encoding the coreference annotation. Despite the extreme simplicity, our model outperforms or closely matches the best coreference systems in the literature on an array of datasets. We also propose an especially simple seq2seq approach that generates only tagged spans rather than the spans interleaved with the original text. Our analysis shows that the model size, the amount of supervision, and the choice of sequence representations are key factors in performance.
- An annotated dataset of coreference in English literature. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 44–54, Marseille, France. European Language Resources Association.
- Coreference Resolution through a seq2seq Transition-Based System. Transactions of the Association for Computational Linguistics, 11:212–226.
- PreCo: A large-scale dataset in preschool vocabulary for coreference resolution. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 172–181, Brussels, Belgium. Association for Computational Linguistics.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Angel Daza and Anette Frank. 2018. A sequence-to-sequence model for semantic role labeling. In Proceedings of The Third Workshop on Representation Learning for NLP, pages 207–216.
- Autoregressive entity retrieval. In International Conference on Learning Representations.
- Vladimir Dobrovolskii. 2021. Word-level coreference resolution. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7670–7675.
- Osamu Gotoh. 1982. An improved algorithm for matching biological sequences. Journal of molecular biology, 162(3):705–708.
- Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8:64–77.
- Bert for coreference resolution: Baselines and analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5803–5808.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR (Poster).
- Coreference resolution without span representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 14–19, Online. Association for Computational Linguistics.
- End-to-end neural coreference resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 188–197, Copenhagen, Denmark. Association for Computational Linguistics.
- Higher-order coreference resolution with coarse-to-fine inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 687–692, New Orleans, Louisiana. Association for Computational Linguistics.
- Autoregressive structured prediction with language models. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 993–1005, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Structured prediction as translation between augmented natural languages. In International Conference on Learning Representations.
- Conll-2012 shared task: Modeling multilingual unrestricted coreference in ontonotes. In Joint conference on EMNLP and CoNLL-shared task, pages 1–40.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–16. IEEE.
- Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3505–3506.
- Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations.
- Scaling within document coreference to long texts. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3921–3931.
- Learning to ignore: Long document coreference with bounded memory neural networks. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8519–8526.
- On generalization in coreference resolution. In Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 111–120, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Sequence to sequence coreference resolution. In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference, pages 39–46, Barcelona, Spain (online). Association for Computational Linguistics.
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
- Corefqa: Coreference resolution as query-based span prediction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6953–6963.
- Zhaofeng Wu and Matt Gardner. 2021. Understanding mention detector-linker interaction in neural coreference resolution. In Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 150–157.
- Incremental neural coreference resolution in constant memory. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8617–8624.
- Patrick Xia and Benjamin Van Durme. 2021. Moving on from ontonotes: Coreference resolution model transfer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5241–5256.
- Liyan Xu and Jinho D. Choi. 2020. Revealing the myth of higher-order inference in coreference resolution. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8527–8533, Online. Association for Computational Linguistics.
- A cluster ranking model for full anaphora resolution. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 11–20.