Papers
Topics
Authors
Recent
Search
2000 character limit reached

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

Published 15 Sep 2021 in cs.CL | (2109.07364v2)

Abstract: Incremental processing allows interactive systems to respond based on partial inputs, which is a desirable property e.g. in dialogue agents. The currently popular Transformer architecture inherently processes sequences as a whole, abstracting away the notion of time. Recent work attempts to apply Transformers incrementally via restart-incrementality by repeatedly feeding, to an unchanged model, increasingly longer input prefixes to produce partial outputs. However, this approach is computationally costly and does not scale efficiently for long sequences. In parallel, we witness efforts to make Transformers more efficient, e.g. the Linear Transformer (LT) with a recurrence mechanism. In this work, we examine the feasibility of LT for incremental NLU in English. Our results show that the recurrent LT model has better incremental performance and faster inference speed compared to the standard Transformer and LT with restart-incrementality, at the cost of part of the non-incremental (full sequence) quality. We show that the performance drop can be mitigated by training the model to wait for right context before committing to an output and that training with input prefixes is beneficial for delivering correct partial outputs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Gerry Altmann and Mark Steedman. 1988. Interaction with context during human sentence processing. Cognition, 30(3):191–238.
  2. Evaluation and optimisation of incremental processors. Dialogue & Discourse, 2(1):113–141. Special Issue on Incremental Processing in Dialogue.
  3. Decision strategies for incremental POS tagging. In Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), pages 26–33, Riga, Latvia. Northern European Association for Language Technology (NEALT).
  4. Rethinking attention with performers. In 9th International Conference on Learning Representations, ICLR 2021, Vienna, Austria, May 3-7, 2021.
  5. Fast and accurate deep network learning by exponential linear units (elus). In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.
  6. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint, arXiv:1805.10190.
  7. Expanding the scope of the ATIS task: The ATIS-3 corpus. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994.
  8. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  9. A novel bi-directional interrelated model for joint intent detection and slot filling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5467–5471, Florence, Italy. Association for Computational Linguistics.
  10. Lyn Frazier and Keith Rayner. 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14(2):178–210.
  11. Murthy Ganapathibhotla and Bing Liu. 2008. Mining opinions in comparative sentences. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 241–248, Manchester, UK. Coling 2008 Organizing Committee.
  12. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pages 249–256, Chia Laguna Resort, Sardinia, Italy. PMLR.
  13. The ATIS spoken language systems pilot corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990.
  14. Finetuning pretrained transformers into RNNs. arXiv preprint, arXiv:2103.13076.
  15. Transformers are RNNs: Fast autoregressive transformers with linear attention. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 5156–5165. PMLR.
  16. Frank Keller. 2010. Cognitively plausible models of human language processing. In Proceedings of the ACL 2010 Conference Short Papers, pages 60–67, Uppsala, Sweden. Association for Computational Linguistics.
  17. Arne Köhn and Wolfgang Menzel. 2014. Incremental predictive parsing with TurboParser. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 803–808, Baltimore, Maryland. Association for Computational Linguistics.
  18. From group to individual labels using deep features. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, page 597–606, New York, NY, USA. Association for Computing Machinery.
  19. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.
  20. STACL: Simultaneous translation with implicit anticipation and controllable latency using prefix-to-prefix framework. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3025–3036, Florence, Italy. Association for Computational Linguistics.
  21. Brielen Madureira and David Schlangen. 2020. Incremental processing in the age of non-incremental encoders: An empirical assessment of bidirectional models for incremental NLU. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 357–374, Online. Association for Computational Linguistics.
  22. William Marslen-Wilson. 1973. Linguistic structure and speech shadowing at very short latencies. Nature, 244:522–3.
  23. Syntax-based simultaneous translation through prediction of unseen syntactic constituents. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 198–207, Beijing, China. Association for Computational Linguistics.
  24. Random feature attention. In 9th International Conference on Learning Representations, ICLR 2021, Vienna, Austria, May 3-7, 2021.
  25. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.
  26. Towards robust linguistic analysis using OntoNotes. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 143–152, Sofia, Bulgaria. Association for Computational Linguistics.
  27. Gabriel Skantze and David Schlangen. 2009. Incremental dialogue processing in a micro-domain. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), pages 745–753, Athens, Greece. Association for Computational Linguistics.
  28. Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826.
  29. Erik F. Tjong Kim Sang and Sabine Buchholz. 2000. Introduction to the CoNLL-2000 shared task chunking. In Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop.
  30. Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution. Journal of Memory and Language, 33(3):285–318.
  31. Approximating stacked and bidirectional recurrent architectures with the delayed recurrent neural network. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 9648–9658. PMLR.
  32. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, pages 5998–6008. Curran Associates, Inc.
  33. Ontonotes release 5.0 ldc2013t19.
  34. Future-guided incremental transformer for simultaneous translation. Proceedings of the AAAI Conference on Artificial Intelligence.
  35. Lukáš Žilka and Filip Jurčíček. 2015. Lectrack: Incremental dialog state tracking with long short-term memory networks. In Text, Speech, and Dialogue, pages 174–182, Cham. Springer International Publishing.
Citations (19)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.