End-to-end Neural Coreference Resolution (1707.07045v2)

Published 21 Jul 2017 in cs.CL

Abstract: We introduce the first end-to-end coreference resolution model and show that it significantly outperforms all previous work without using a syntactic parser or hand-engineered mention detector. The key idea is to directly consider all spans in a document as potential mentions and learn distributions over possible antecedents for each. The model computes span embeddings that combine context-dependent boundary representations with a head-finding attention mechanism. It is trained to maximize the marginal likelihood of gold antecedent spans from coreference clusters and is factored to enable aggressive pruning of potential mentions. Experiments demonstrate state-of-the-art performance, with a gain of 1.5 F1 on the OntoNotes benchmark and by 3.1 F1 using a 5-model ensemble, despite the fact that this is the first approach to be successfully trained with no external resources.

Authors (4)

Kenton Lee (40 papers)
Luheng He (20 papers)
Mike Lewis (78 papers)
Luke Zettlemoyer (225 papers)

Citations (873)

View on Semantic Scholar

Summary

End-to-end Neural Coreference Resolution

The paper "End-to-end Neural Coreference Resolution" presents a novel approach to coreference resolution using a neural model that operates in an end-to-end manner without relying on syntactic parsers or hand-crafted mention detection algorithms. This methodology proves advantageous by circumventing cascading errors common in pipelined systems and achieves state-of-the-art results on the OntoNotes benchmark.

Core Contributions

Span-based Architecture: The core innovation of the proposed model is the treatment of all possible spans in a document as potential mentions. The model learns distributions over possible antecedents for each span. Span embeddings are constructed by combining context-dependent boundary representations with a head-finding attention mechanism.
Optimization: The model is trained to maximize the marginal likelihood of antecedent spans from gold coreference clusters, enabling aggressive pruning of potential mentions. This training strategy results in both span detection and coreference clustering being learned jointly from the data, enhancing performance and interpretability.
Empirical Results: Experiments demonstrate robust performance, achieving a gain of 1.5 F1 on the OntoNotes benchmark with a single model and 3.1 F1 using a 5-model ensemble. These improvements are attained without using external resources, highlighting the power of the end-to-end approach.

Model Architecture

The architecture is designed to score spans and their antecedents efficiently:

Span Representation: Each span is represented using context-dependent embeddings generated by a bidirectional LSTM, combined with a head-finding attention mechanism to determine relevant words within the span.
Scoring Mechanisms: Unary scores for mentions and pairwise antecedent scores are computed via feed-forward neural networks. These scores are then combined to produce final coreference decisions.

Comparison with Prior Work

Traditional coreference models, including both non-neural and recent neural approaches, typically employ syntactic parsers to extract head-word features and propose candidate mentions. The reliance on parsers inherently introduces errors cascading from one stage to another. Additionally, these systems often struggle with generalizability across different languages due to linguistically specific rule sets. The end-to-end model addresses these limitations by directly learning from the data, bypassing the need for hand-engineered features and parsing.

Span Pruning and Mention Detection

Despite evaluating all potential spans being computationally infeasible due to quadratic complexity, the model effectively prunes spans using learned unary mention scores. The experiments reveal that the model maintains high mention recall (~92.7%) even with significant pruning, highlighting the efficacy of joint span detection and mention scoring.

Impact of Different Components

Ablation studies demonstrate that each component—attention mechanisms, pre-trained embeddings, span features—collectively contributes to the model’s performance. Notably, the head-finding attention mechanism significantly aligns with traditional head definitions while focusing on task-specific attributes, enhancing the model’s interpretability.

Practical Implications

This research implies a paradigm shift in coreference resolution systems where purely data-driven, end-to-end models can outperform traditional, heavily-engineered approaches. This methodology could be extended to other tasks within NLP, suggesting a move towards more unified, neural network-based approaches that simplify pipelines and reduce error propagation.

Future Directions

Potential future developments might include integrating world knowledge and more complex entity-level inference mechanisms to address the limitations observed in the current model, especially in handling ambiguous references or ensuring contextual appropriateness. Moreover, augmenting the training dataset or enhancing span representations could further improve accuracy and robustness.

In conclusion, the end-to-end approach presented in this paper represents a significant advancement in coreference resolution, offering a more accurate, interpretable, and streamlined methodology that sets a new benchmark in the field.

PDF Markdown