Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU (2010.05330v2)

Published 11 Oct 2020 in cs.CL

Abstract: While humans process language incrementally, the best language encoders currently used in NLP do not. Both bidirectional LSTMs and Transformers assume that the sequence that is to be encoded is available in full, to be processed either forwards and backwards (BiLSTMs) or as a whole (Transformers). We investigate how they behave under incremental interfaces, when partial output must be provided based on partial input seen up to a certain time step, which may happen in interactive systems. We test five models on various NLU datasets and compare their performance using three incremental evaluation metrics. The results support the possibility of using bidirectional encoders in incremental mode while retaining most of their non-incremental quality. The "omni-directional" BERT model, which achieves better non-incremental performance, is impacted more by the incremental access. This can be alleviated by adapting the training regime (truncated training), or the testing procedure, by delaying the output until some right context is available or by incorporating hypothetical right contexts generated by a LLM like GPT-2.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Incremental understanding in human-computer dialogue and experimental evidence for advantages over nonincremental methods. In Proceedings of Decalog 2007, the 11th International Workshop on the Semantics and Pragmatics of Dialogue, Trento, Italy.
  2. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15(11):937–946.
  3. Evaluation and optimisation of incremental processors. Dialogue & Discourse, 2(1):113–141.
  4. Decision strategies for incremental POS tagging. In Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), pages 26–33.
  5. Timothy Brick and Matthias Scheutz. 2007. Incremental natural language processing for HRI. In Proceedings of the ACM/IEEE international conference on Human-robot interaction, pages 263–270.
  6. Combining incremental language generation and incremental speech synthesis for adaptive information presentation. In Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 295–303, Seoul, South Korea. Association for Computational Linguistics.
  7. T. Chiang. 2002. Stories of Your Life and Others. Tom Doherty Associates.
  8. Jason PC Chiu and Eric Nichols. 2016. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4:357–370.
  9. Morten H Christiansen and Nick Chater. 2016. The now-or-never bottleneck: A fundamental constraint on language. Behavioral and brain sciences, 39.
  10. An Incremental Turn-Taking Model for Task-Oriented Dialog Systems. In Proc. Interspeech 2019, pages 4155–4159.
  11. SNIPS voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190.
  12. Incremental decoding and training methods for simultaneous translation in neural machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 493–499, New Orleans, Louisiana. Association for Computational Linguistics.
  13. Incremental interpretation and prediction of utterance meaning for interactive dialogue. Dialogue & Discourse, 2(1):143–170.
  14. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  15. A novel bi-directional interrelated model for joint intent detection and slot filling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5467–5471, Florence, Italy. Association for Computational Linguistics.
  16. Jeffrey L Elman. 1990. Finding structure in time. Cognitive science, 14(2):179–211.
  17. Murthy Ganapathibhotla and Bing Liu. 2008. Mining opinions in comparative sentences. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 241–248, Manchester, UK. Coling 2008 Organizing Committee.
  18. Alex Graves and Jürgen Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural networks, 18(5-6):602–610.
  19. Don’t until the final verb wait: Reinforcement learning for simultaneous machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1342–1352, Doha, Qatar. Association for Computational Linguistics.
  20. Deep semantic role labeling: What works and what’s next. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 473–483.
  21. The ATIS spoken language systems pilot corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27, 1990.
  22. Inkrementelle sprachrezeption. KogWis99: Proceedings der 4. Fachtagung der Gesellschaft für Kognitionswissenschaft.
  23. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
  24. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991.
  25. Analysing the potential of seq-to-seq models for incremental interpretation in task-oriented dialogue. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 165–174, Brussels, Belgium. Association for Computational Linguistics.
  26. Building a highly accurate mandarin speech recognizer with language-independent technologies and language-dependent modules. IEEE transactions on audio, speech, and language processing, 17(7):1253–1262.
  27. Ozan İrsoy and Claire Cardie. 2014. Opinion mining with deep recurrent neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 720–728, Doha, Qatar. Association for Computational Linguistics.
  28. Gerard Kempen and Edward Hoenkamp. 1982. Incremental sentence generation: Implications for the structure of a syntactic processor. In Coling 1982: Proceedings of the Ninth International Conference on Computational Linguistics, pages 151–156.
  29. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  30. Arne Köhn. 2018. Incremental natural language processing: Challenges, strategies, and evaluation. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2990–3003, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
  31. Incremental semantic role labeling with tree adjoining grammar. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 301–312, Doha, Qatar. Association for Computational Linguistics.
  32. From group to individual labels using deep features. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 597–606.
  33. Willem J. M. Levelt. 1989. Speaking: From intention to articulation. London: MIT Press.
  34. Finding function in form: Compositional character models for open vocabulary word representation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1520–1530, Lisbon, Portugal. Association for Computational Linguistics.
  35. STACL: Simultaneous translation with implicit anticipation and controllable latency using prefix-to-prefix framework. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3025–3036, Florence, Italy. Association for Computational Linguistics.
  36. William D. Marslen-Wilson. 1975. Sentence perception as an interactive parallel process. Science, 189(4198):226–228.
  37. Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association.
  38. Joakim Nivre. 2004. Incrementality in deterministic dependency parsing. In Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together, pages 50–57, Barcelona, Spain. Association for Computational Linguistics.
  39. Syntax-based simultaneous translation through prediction of unseen syntactic constituents. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 198–207, Beijing, China. Association for Computational Linguistics.
  40. Joint satisfaction of syntactic and pragmatic constraints improves incremental spoken language understanding. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 514–523. Association for Computational Linguistics.
  41. GloVe: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543.
  42. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
  43. Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 412–418, Berlin, Germany. Association for Computational Linguistics.
  44. Towards robust linguistic analysis using ontonotes. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 143–152.
  45. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  46. Introduction to the CoNLL-2000 shared task: chunking. In Proceedings of CoNLL-2000, Lisbon, Portugal, pages 127–132.
  47. Incremental reference resolution: The task, metrics for evaluation, and a Bayesian filtering model that is sensitive to disfluencies. In Proceedings of the SIGDIAL 2009 Conference, pages 30–37, London, UK. Association for Computational Linguistics.
  48. David Schlangen and Gabriel Skantze. 2011. A general, abstract model of incremental dialogue processing. Dialogue & Discourse, 2(1):83–111.
  49. Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11):2673–2681.
  50. Stability and accuracy in incremental speech recognition. In Proceedings of the SIGDIAL 2011 Conference, pages 110–119, Portland, Oregon. Association for Computational Linguistics.
  51. Gabriel Skantze and David Schlangen. 2009. Incremental dialogue processing in a micro-domain. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), pages 745–753, Athens, Greece. Association for Computational Linguistics.
  52. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112.
  53. Michael K Tanenhaus and Sarah Brown-Schmidt. 2008. Language processing in the natural world. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1493):1105–1122.
  54. A multi-task approach to incremental dialogue state tracking. In Proceedings of The 22nd workshop on the Semantics and Pragmatics of Dialogue, SEMDIAL, pages 132–145.
  55. Assessing incrementality in sequence-to-sequence models. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 209–217, Florence, Italy. Association for Computational Linguistics.
  56. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
  57. Prosodic knowledge sources for automatic speech recognition. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03)., volume 1, pages I–I. IEEE.
  58. Ontonotes release 5.0 LDC2013t19. web download. Linguistic Data Consortium, Philadelphia, PA.
  59. Neural models for sequence chunking. In Thirty-First AAAI Conference on Artificial Intelligence.
  60. Lukáš Žilka and Filip Jurčíček. 2015. Lectrack: Incremental dialog state tracking with long short-term memory networks. In International Conference on Text, Speech, and Dialogue, pages 174–182. Springer.
Citations (18)

Summary

We haven't generated a summary for this paper yet.