Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Reinforcement Learning for Mention-Ranking Coreference Models (1609.08667v3)

Published 27 Sep 2016 in cs.CL

Abstract: Coreference resolution systems are typically trained with heuristic loss functions that require careful tuning. In this paper we instead apply reinforcement learning to directly optimize a neural mention-ranking model for coreference evaluation metrics. We experiment with two approaches: the REINFORCE policy gradient algorithm and a reward-rescaled max-margin objective. We find the latter to be more effective, resulting in significant improvements over the current state-of-the-art on the English and Chinese portions of the CoNLL 2012 Shared Task.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Kevin Clark (16 papers)
  2. Christopher D. Manning (169 papers)
Citations (369)

Summary

Deep Reinforcement Learning for Mention-Ranking Coreference Models: A Summary

In the pursuit of enhancing coreference resolution systems, the paper titled "Deep Reinforcement Learning for Mention-Ranking Coreference Models" by Kevin Clark and Christopher D. Manning presents a novel approach leveraging reinforcement learning. Traditional coreference systems typically employ heuristic loss functions with hyperparameters that necessitate careful tuning across different datasets and languages. This paper proposes reinforcement learning as a solution to directly optimize neural mention-ranking models for coreference evaluation metrics.

Core Contributions

The paper's primary contribution is the application of reinforcement learning techniques to the coreference problem, enhancing the training efficiency and effectiveness of mention-ranking models. Two methods are examined: the policy gradient-based {\sc reinforce} algorithm and a reward-rescaled max-margin objective. Notably, the reward-rescaled approach yields significant improvements over the state-of-the-art systems on the CoNLL 2012 Shared Task datasets for both English and Chinese languages.

Approach Details

  1. Neural Mention-Ranking Model: The paper describes a mention-ranking model that evaluates coreference likelihoods using a neural network. This model scores pairs of mentions independently, which is optimal for reinforcement learning since the impact of each decision is easily assessable.
  2. Reinforcement Learning Applications: The reinforcement learning framework is streamlined to optimize directly for coreference metrics, using a reward that incorporates the B3^3 evaluation metric. The reward-rescaled max-margin loss recalculates the loss based on an exact assessment of each coreference decision's impact on the final reward.
  3. Evaluation: Extensive evaluations demonstrate that the reward-rescaled objective consistently outperforms the {\sc reinforce} algorithm and heuristic losses traditionally used.

Numerical Results

The experimental results markedly show the superior performance of the reward-rescaling method. Specifically, on the English CoNLL 2012 test dataset, the reward-rescaled method achieves an average F1 score of 65.73, surpassing competitors using heuristic and baseline models. In the Chinese dataset, the method similarly outperforms others, securing an average F1 score of 63.88.

Implications and Future Directions

The application of reinforcement learning to mention-ranking in coreference resolution presents both practical and theoretical implications. Practically, this approach reduces the onerous requirement for hyperparameter tuning, thus offering a more robust tool adaptable to various datasets and languages. Theoretically, it prompts consideration of reinforcement learning techniques across diverse structured prediction tasks beyond coreference resolution. The paper conjectures that further exploration could lead to integrating additional coreference metrics directly into learning algorithms or improving sampling techniques within policy optimization frameworks. Future research could also explore fine-tuning architectures to exploit neural model expressivity maximally, paving the way for further optimization in structured prediction domains.