- The paper introduces a neural network that learns entity-level representations to merge coreference clusters effectively.
- It employs a novel two-stage pretraining and easy-first clustering strategy, reducing error propagation in cluster merges.
- Experimental results on CoNLL English and Chinese datasets demonstrate significant performance improvements over previous models.
Improving Coreference Resolution with Entity-Level Distributed Representations
This paper addresses the fundamental challenge in coreference resolution: effectively utilizing entity-level information. Traditional coreference systems have often focused on linking mention pairs, which limits their capability to incorporate holistic entity-level insights. This research introduces a novel neural network architecture for coreference resolution that builds high-dimensional vector representations for pairs of coreference clusters, thus embracing entity-level features with greater flexibility and scope.
Neural Network Architecture and Learning Mechanism
The authors propose a neural system that learns to decide when merging clusters is advantageous, advancing the field by utilizing less manual feature engineering than previous approaches. The core of this system is a learning-to-search algorithm motivated by SEARN, which optimizes local decisions (such as cluster merges) to enhance the final coreference partition. This system remarkably surpasses previous state-of-the-art methods on the English and Chinese sections of the CoNLL 2012 Shared Task dataset.
The system comprises several components:
- Mention-Pair Encoder: This sub-network processes individual mention pairs by employing a feedforward neural network to generate vector representations.
- Cluster-Pair Encoder: It extends the mention-level representations to cluster-level by summarizing mention-pair representations using pooling operations. This process facilitates capturing richer entity-level information.
- Cluster-Ranking Model: This component scores cluster pairs through a simple neural network layer, guiding the merging of mention clusters based on learned representations.
The mention-ranking model, a precursor to the cluster-ranking model, provides initial weights and serves as a fast-pruning tool for candidate cluster merges, significantly enhancing system efficiency.
Training Methodology and Experiments
Training uses a two-stage pretraining method for the mention-ranking model, transitioning from easier pair classification to more complex max-margin ranking tasks. This sequential pretraining is shown to substantially improve accuracy, highlighting its importance in effective model initialization.
For training the cluster-ranking model, the paper introduces an easy-first clustering framework, sorting mentions by the score of their highest candidate coreference link. This rearrangement delays challenging decisions, reducing the cascading effect of early errors.
Key experimental results reveal that the cluster-ranking model, which leverages entity-level representations, outperforms mention-ranking counterparts, particularly boosting scores in CEAFϕ4, a more recent coreference metric tailored to address the deficiencies of earlier metrics.
Implications and Future Directions
The proposed framework's use of distributed representations to dynamically capture entity-level semantics signifies an advancement in coreference strategies. Neural networks, via embeddings, can encapsulate complex semantic relationships, moving beyond traditional, meticulously defined feature engineering.
This research opens several future exploration avenues. Neural networks' inherent ability to generalize could pave the way for multilingual coreference systems with minimal manual adaptation. Furthermore, integrating advanced linguistic structures like transformers could enrich representations and further augment the coreference process.
Overall, by transitioning from static feature reliance to dynamic, learned entity representations, this work provides a robust pathway for more nuanced and effective coreference resolution tools, setting the stage for continual enhancements in understanding complex textual interdependencies.