Papers
Topics
Authors
Recent
Search
2000 character limit reached

Named Entity Recognition with Bidirectional LSTM-CNNs

Published 26 Nov 2015 in cs.CL, cs.LG, and cs.NE | (1511.08308v5)

Abstract: Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high performance. In this paper, we present a novel neural network architecture that automatically detects word- and character-level features using a hybrid bidirectional LSTM and CNN architecture, eliminating the need for most feature engineering. We also propose a novel method of encoding partial lexicon matches in neural networks and compare it to existing approaches. Extensive evaluation shows that, given only tokenized text and publicly available word embeddings, our system is competitive on the CoNLL-2003 dataset and surpasses the previously reported state of the art performance on the OntoNotes 5.0 dataset by 2.13 F1 points. By using two lexicons constructed from publicly-available sources, we establish new state of the art performance with an F1 score of 91.62 on CoNLL-2003 and 86.28 on OntoNotes, surpassing systems that employ heavy feature engineering, proprietary lexicons, and rich entity linking information.

Citations (1,844)

Summary

  • The paper presents a novel hybrid architecture that automatically learns word and character features to improve NER performance.
  • It achieves state-of-the-art F1 scores of 91.62 on CoNLL-2003 and 86.28 on OntoNotes 5.0, demonstrating significant practical improvements.
  • The study introduces an innovative lexicon encoding scheme that enhances match precision and reduces reliance on manual feature engineering.

Named Entity Recognition with Bidirectional LSTM-CNNs

The paper "Named Entity Recognition with Bidirectional LSTM-CNNs" by Jason P.C. Chiu and Eric Nichols presents a neural architecture designed to improve Named Entity Recognition (NER) performance while minimizing the need for extensive feature engineering. The proposed model effectively combines character-level Convolutional Neural Networks (CNNs) with bidirectional Long Short-Term Memory (LSTM) networks to automatically learn both word- and character-level features from tokenized text data and word embeddings.

Key Contributions

The study highlights several key contributions:

  1. Hybrid Neural Network Architecture: The model integrates character-level CNNs with bidirectional LSTM networks to leverage the capabilities of both architectures. This combination addresses the limitations of feed-forward models used in prior research, particularly the inability to capture long-distance dependencies.
  2. Minimal Feature Engineering: By learning important features automatically, the proposed approach reduces the reliance on hand-crafted features and lexicons traditionally used to achieve high NER performance.
  3. Novel Lexicon Encoding Scheme: The authors introduce a technique for encoding partial lexicon matches, enhancing the utility of external lexicons in the model.

Experimental Findings

The model's efficacy was evaluated on two major NER datasets: CoNLL-2003 and OntoNotes 5.0. The results indicated strong performance improvements:

  • CoNLL-2003 Dataset: Using tokenized text and publicly available word embeddings, the model achieved an F1 score of 91.62, surpassing the previous state of the art by 2.13 points.
  • OntoNotes 5.0 Dataset: The system demonstrated an F1 score of 86.28, representing a significant improvement over previous best-reported results.

In addition to individual lexicon use, the study found that employing both SENNA and DBpedia lexicons provided complementary benefits, further enhancing the model’s performance on the CoNLL-2003 dataset.

Detailed Analysis

Word Embeddings

The study evaluated various sources of word embeddings, including Collobert’s embeddings, Stanford's GloVe, and Google's word2vec. Each set of embeddings was assessed for its impact on the model’s performance. Notably, Collobert's embeddings, trained on the Reuters RCV-1 corpus, exhibited the best performance on the CoNLL-2003 dataset. This demonstrates the importance of domain-specific training data in improving the quality of word embeddings.

Character-level Features

Character-level CNNs were shown to significantly enhance the NER performance compared to models that only utilized word embeddings and additional hand-crafted features like capitalization and character type. Character-level CNNs extracted rich features that provided the neural network with a more nuanced representation of the input text.

Dropout Regularization

The effectiveness of dropout regularization was examined, revealing that dropout significantly improved the model’s resistance to overfitting. The optimal dropout rates were empirically determined, and their application resulted in substantial performance improvements across both datasets.

Lexicon Matching

The proposed lexicon matching algorithm, which allowed for partial matches and used a BIOES encoding scheme, was found to outperform simpler methods in both precision and recall. Particularly for the DBpedia lexicon, this sophisticated approach significantly reduced noise and improved match relevance.

Implications and Future Work

The theoretical implications of this research extend to other NLP tasks involving sequential labeling, suggesting that hybrid neural architectures can effectively reduce the need for extensive feature engineering while achieving high performance. Practically, this model simplifies NER system development, making it more accessible for applications across different domains.

Future developments in this line of research could include more effective methods for constructing and applying lexicons, as well as extending the model to handle tasks like extended tagset NER and entity linking. Further exploration into different neural architectures and training algorithms for word embeddings could also yield additional performance boosts and robustness across more varied NER tasks.

In summary, this paper provides a substantial advancement in NER methodologies by presenting a sophisticated neural network architecture that learns intricate features from raw text. The approach not only achieves high accuracy but also simplifies the feature engineering process, thus broadening the accessibility and applicability of NER systems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.