Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

reproducing "ner and pos when nothing is capitalized" (2109.08396v1)

Published 17 Sep 2021 in cs.CL

Abstract: Capitalization is an important feature in many NLP tasks such as Named Entity Recognition (NER) or Part of Speech Tagging (POS). We are trying to reproduce results of paper which shows how to mitigate a significant performance drop when casing is mismatched between training and testing data. In particular we show that lowercasing 50% of the dataset provides the best performance, matching the claims of the original paper. We also show that we got slightly lower performance in almost all experiments we have tried to reproduce, suggesting that there might be some hidden factors impacting our performance. Lastly, we make all of our work available in a public github repository.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Andreas Kuster (3 papers)
  2. Jakub Filipek (3 papers)
  3. Viswa Virinchi Muppirala (3 papers)

Summary

We haven't generated a summary for this paper yet.