Stochastic Answer Networks for Machine Reading Comprehension (1712.03556v2)

Published 10 Dec 2017 in cs.CL

Abstract: We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension. Compared to previous work such as ReasoNet which used reinforcement learning to determine the number of steps, the unique feature is the use of a kind of stochastic prediction dropout on the answer module (final layer) of the neural network during the training. We show that this simple trick improves robustness and achieves results competitive to the state-of-the-art on the Stanford Question Answering Dataset (SQuAD), the Adversarial SQuAD, and the Microsoft MAchine Reading COmprehension Dataset (MS MARCO).

Authors (4)

Xiaodong Liu (162 papers)
Yelong Shen (83 papers)
Kevin Duh (65 papers)
Jianfeng Gao (344 papers)

Citations (195)

View on Semantic Scholar

Summary

Analysis of Stochastic Answer Networks for Machine Reading Comprehension

The paper "Stochastic Answer Networks for Machine Reading Comprehension" presents a novel approach for advancing machine reading comprehension (MRC) through a stochastic answer network (SAN) model. This model innovatively integrates the mechanism of stochastic prediction dropout, enhancing robustness and accuracy in machine reading tasks without the complexity typically associated with reinforcement learning methods.

Core Contributions

Stochastic Prediction Dropout: Unlike prior models like ReasoNet which use reinforcement learning to handle multi-step reasoning, SAN employs a stochastic prediction dropout on the answer module (final layer) during training. This technique involves random dropout of predictions at different reasoning steps, with answers derived from an average of retained predictions. This acts as a stochastic ensemble method that enhances model robustness and accuracy.
Multi-step Reasoning: SAN follows a fixed multi-step reasoning approach but improves upon it by retaining predictions across all steps rather than focusing solely on the final step. This setup aligns more closely with how humans naturally re-read and re-synthesize information iteratively to infer answers, thereby outperforming models that rely on a single prediction step.
Architectural Design: The SAN architecture is divided into four layers: Lexicon Encoding, Contextual Encoding, Memory Generation, and the Answer Module. The use of pre-trained embeddings such as GloVe and CoVe vectors for lexicon encoding, coupled with innovative attention mechanisms, enriches contextual understanding and supports robust memory generation.

Empirical Results

The SAN model demonstrates competitive performance across multiple benchmarks. On the SQuAD dataset, SAN achieves an Exact Match (EM) score of 76.235% and an F1 score of 84.056%, surpassing many existing models like standard single-step approaches and dynamic step ReasoNet models. Additionally, SAN exhibits improved resilience against adversarial datasets such as AddSent and AddOneSent variations of SQuAD, which are designed to challenge model robustness. Furthermore, on the MS MARCO dataset, which includes real-world user queries and multiple passages, SAN achieves a BLEU score of 43.85 and a ROUGE-L score of 46.14, indicating strong adaptability in diverse tasks.

Theoretical and Practical Implications

The introduction of stochastic prediction dropout in SAN contributes to the understanding of step bias in multi-step reasoning, suggesting that preventing reliance on a single prediction step can lead to more stable and accurate models. The move away from reinforcement learning for multi-step processing also simplifies the training process while yielding robust outcomes. In practice, the capability of SAN to maintain strong performance across different datasets and adversarial challenges positions it as a reliable solution for real-world MRC applications, such as conversational agents or customer service chatbots.

Future Directions

Future research could focus on exploring the theoretical connections between SAN and other memory network models and extending SAN’s application to other NLP tasks, such as text classification and inference. Given its success across varied datasets, further architectural enhancements or the integration of more sophisticated linguistic features could push the efficacy of SAN to even greater heights. Investigating the scalability of SAN for larger, more complex contexts would also be a valuable extension of this work.

Overall, the proposed SAN model marks a substantial improvement in the field of MRC, combining simplicity and effectiveness to address the demands of current language understanding tasks.

PDF Markdown