Linguistically-Informed Self-Attention for Semantic Role Labeling (1804.08199v3)

Published 23 Apr 2018 in cs.CL

Abstract: Current state-of-the-art semantic role labeling (SRL) uses a deep neural network with no explicit linguistic features. However, prior work has shown that gold syntax trees can dramatically improve SRL decoding, suggesting the possibility of increased accuracy from explicit modeling of syntax. In this work, we present linguistically-informed self-attention (LISA): a neural network model that combines multi-head self-attention with multi-task learning across dependency parsing, part-of-speech tagging, predicate detection and SRL. Unlike previous models which require significant pre-processing to prepare linguistic features, LISA can incorporate syntax using merely raw tokens as input, encoding the sequence only once to simultaneously perform parsing, predicate detection and role labeling for all predicates. Syntax is incorporated by training one attention head to attend to syntactic parents for each token. Moreover, if a high-quality syntactic parse is already available, it can be beneficially injected at test time without re-training our SRL model. In experiments on CoNLL-2005 SRL, LISA achieves new state-of-the-art performance for a model using predicted predicates and standard word embeddings, attaining 2.5 F1 absolute higher than the previous state-of-the-art on newswire and more than 3.5 F1 on out-of-domain data, nearly 10% reduction in error. On ConLL-2012 English SRL we also show an improvement of more than 2.5 F1. LISA also out-performs the state-of-the-art with contextually-encoded (ELMo) word representations, by nearly 1.0 F1 on news and more than 2.0 F1 on out-of-domain text.

Citations (376)

View on Semantic Scholar

Summary

The paper introduces LISA, a novel architecture that integrates syntactic information into self-attention to improve semantic role labeling performance.
It employs multi-task learning to jointly perform dependency parsing, POS tagging, predicate detection, and SRL in one efficient pass.
Incorporating ELMo and high-quality syntactic parses, LISA achieves state-of-the-art F1 scores on CoNLL datasets with error reductions of up to 10%.

Linguistically-Informed Self-Attention for Semantic Role Labeling

The paper, "Linguistically-Informed Self-Attention for Semantic Role Labeling," presents a novel neural architecture named LISA, which integrates linguistic features directly into the self-attention mechanism for the task of Semantic Role Labeling (SRL). This methodology defies the recent trend of employing deep learning models that typically exclude explicit syntactic information yet achieve state-of-the-art performances. The research emphasizes the potential accuracy improvements by incorporating syntax in the self-attention frameworks used in processing linguistic data, specifically SRL.

Model Overview and Innovations

The core innovation of LISA originates from combining multi-head self-attention with multi-task learning. This architecture facilitates joint processing of dependency parsing, part-of-speech tagging, predicate detection, and SRL, all from raw tokens. A distinctive feature of LISA is its capability to incorporate syntactic information via one of the attention heads trained to focus on syntactic parse parents for each token. This enhancement allows LISA to function end-to-end, eliminating the need for extensive pre-processing and external predicate detection modules typical of many SRL models.

This integrative approach empowers LISA to process a sentence once to perform all tasks rather than multiple encodings for different functions. Furthermore, the model allows for the injection of high-quality syntactic parses at test time, enhancing performance without necessitating model retraining.

Experimental Results and Performance

LISA achieves new state-of-the-art performance on the CoNLL-2005 and CoNLL-2012 SRL datasets using predicted predicates and standard embeddings. On the CoNLL-2005 dataset, LISA improved F1 performance by 2.5 and 3.5 points on newswire and out-of-domain data respectively, which corresponds to nearly a 10% reduction in error when compared to previous models. Incorporating ELMo word representations further pushed LISA ahead, outperforming existing top-tier models by 1.0 F1 on news and 2.0 F1 on out-of-domain texts.

Additionally, when provided with gold-quality syntactic parses at test time, LISA demonstrates a potential for further enhancement by achieving significant gains over models not utilizing syntax, underlying the latent potential in integrating syntactic awareness into neural architectures for natural language processing.

Implications and Future Directions

By efficiently harnessing syntactic structures, LISA shows that linguistically-informed attention mechanisms are advantageous for SRL. This work underscores the importance of combining syntactic parsing advances with neural networks, offering an architecture that can seamlessly integrate external improvements in syntax without necessitating model re-engineering.

Looking forward, there are several avenues for enhancing LISA and similar models. Improving parsing accuracy—perhaps through deeper exploration of the interaction between syntactic structures and attention weights—could elevate SRL performances further. Additionally, optimizing training techniques tailored to multi-task setups, such as employing advanced strategies like scheduled sampling, may fortify the model's robustness. Beyond SRL, LISA's architecture may lend itself to broader applications in NLP tasks that benefit from syntactic richness, suggesting fertile ground for future exploration.

In summary, the paper introduces a sophisticated architectural advance in NLP, stressing the efficacy of syntactically-aware neural networks. The notable accomplishments in SRL task performance signify substantial potential for future research and applications across various domains of natural language processing.

PDF Markdown

Related Papers

YouTube

Show All Videos