Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Coreferential Reasoning Learning for Language Representation (2004.06870v2)

Published 15 Apr 2020 in cs.CL

Abstract: Language representation models such as BERT could effectively capture contextual semantic information from plain text, and have been proved to achieve promising results in lots of downstream NLP tasks with appropriate fine-tuning. However, most existing language representation models cannot explicitly handle coreference, which is essential to the coherent understanding of the whole discourse. To address this issue, we present CorefBERT, a novel language representation model that can capture the coreferential relations in context. The experimental results show that, compared with existing baseline models, CorefBERT can achieve significant improvements consistently on various downstream NLP tasks that require coreferential reasoning, while maintaining comparable performance to previous models on other common NLP tasks. The source code and experiment details of this paper can be obtained from https://github.com/thunlp/CorefBERT.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Deming Ye (10 papers)
  2. Yankai Lin (125 papers)
  3. Jiaju Du (3 papers)
  4. Zhenghao Liu (77 papers)
  5. Peng Li (390 papers)
  6. Maosong Sun (337 papers)
  7. Zhiyuan Liu (433 papers)
Citations (170)

Summary

Coreferential Reasoning Learning for Language Representation: An Evaluation of CorefBERT

The paper "Coreferential Reasoning Learning for Language Representation" introduces CorefBERT, a novel language representation model that enhances coreferential reasoning capabilities within NLP tasks. LLMs like BERT have achieved remarkable success in NLP, yet handling coreference, essential for coherent discourse understanding, remains challenging. CorefBERT is specifically designed to address this gap, implementing a sophisticated approach towards modeling and predicting coreferential relations.

Technical Summary

CorefBERT introduces a new pre-training task, Mention Reference Prediction (MRP), alongside the traditional Masked LLMing (MLM). MRP improves the model's ability to learn from the coreferences within text by predicting masked mentions based on their contextual references. This is further enhanced by the introduction of a copy-based training objective, which essentially learns to replicate words from the context rather than predicting from the entire vocabulary, thus enabling a more contextually aware representation that aligns closely with coreference resolution processes.

Key Experimental Outcomes

CorefBERT shows substantial improvements in several downstream NLP tasks that benefit significantly from coreference reasoning. One of the primary benchmarks used is the QUOREF dataset, which explicitly tests coreferential understanding. Here, CorefBERT achieved notable F1 gains compared to BERT and RoBERTa baselines, demonstrating its enhanced capability. Task-specific modifications, such as additional reasoning layers for QUOREF, further validated these findings. Additionally, CorefBERT outperforms baseline models on document-level relation extraction (DocRED) and fact verification (FEVER), tasks which inherently require coherent entity linkage across text segments. The robustness of CorefBERT is evidenced further by maintaining comparable performance on more generic NLP tasks in the GLUE benchmark, despite being optimized for context-specific reasoning.

Theoretical and Practical Implications

Conceptually, the core enhancements in CorefBERT underscore the potential for targeted pre-training tasks to significantly uplift specialized language understanding capabilities without compromising general performance. This aligns with the broader trend of enriching language representations with task-specific objectives.

Practically, CorefBERT’s advancements imply that leveraging coreference-aware representations can directly benefit applications requiring complex entity tracking—typical examples being multi-hop question answering and advanced document-level information extraction tasks. The method's ability to work seamlessly with substantial pre-trained models like RoBERTa suggests scalability across large-scale datasets and extensive use-cases.

Future Directions

While CorefBERT sets a precedent for integrating coreferential reasoning abilities within LLMs, there remains room for development. Expanding the handling of diverse coreferential elements such as pronouns through more sophisticated mechanisms such as joint models that better handle the antecedent-anaphor resolution is one potential avenue. Moreover, mitigating noise from unsupervised coreference predictions through improved labeling strategies might further refine pre-training processes.

In summary, the CorefBERT approach exemplifies a significant step towards embedding nuanced discourse understanding within pre-trained LLMs, promising both theoretical insights and practical utility in advancing the capacity of AI to process complex text structures.