Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
10 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Compositional Generalization in Spoken Language Understanding (2312.15815v1)

Published 25 Dec 2023 in cs.CL

Abstract: State-of-the-art spoken language understanding (SLU) models have shown tremendous success in benchmark SLU datasets, yet they still fail in many practical scenario due to the lack of model compositionality when trained on limited training data. In this paper, we study two types of compositionality: (a) novel slot combination, and (b) length generalization. We first conduct in-depth analysis, and find that state-of-the-art SLU models often learn spurious slot correlations during training, which leads to poor performance in both compositional cases. To mitigate these limitations, we create the first compositional splits of benchmark SLU datasets and we propose the first compositional SLU model, including compositional loss and paired training that tackle each compositional case respectively. On both benchmark and compositional splits in ATIS and SNIPS, we show that our compositional SLU model significantly outperforms (up to $5\%$ F1 score) state-of-the-art BERT SLU model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. D. Hakkani-Tür, G. Tür, A. Celikyilmaz, Y.-N. Chen, J. Gao, L. Deng, and Y.-Y. Wang, “Multi-domain joint semantic frame parsing using bi-directional rnn-lstm.” in INTERSPEECH, 2016, pp. 715–719.
  2. B. Liu and I. Lane, “Attention-based recurrent neural network models for joint intent detection and slot filling,” in Interspeech 2016, 2016, pp. 685–689.
  3. Y. Kim, S. Lee, and K. Stratos, “ONENET: joint domain, intent, slot prediction for spoken language understanding,” in 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017, pp. 547–553.
  4. C.-W. Goo, G. Gao, Y.-K. Hsu, C.-L. Huo, T.-C. Chen, K.-W. Hsu, and Y.-N. Chen, “Slot-gated modeling for joint slot filling and intent prediction,” in Proc. of the 16th NAACL-HLT, 2018.
  5. Y. Wang, Y. Shen, and H. Jin, “A bi-model based RNN semantic frame parsing model for intent detection and slot filling,” in Proc. of the 2018 NAACL-HLT, Volume 2 (Short Papers), 2018, pp. 309–314.
  6. N. Chomsky, “Syntactic structures,” Mouton, 1957.
  7. J. A. Fodor and Z. W. Pylyshyn, “Connectionism and cognitive architecture: A critical analysis,” Cognition, vol. 28, no. 1-2, pp. 3–71, 1988.
  8. Q. Chen, Z. Zhuo, and W. Wang, “BERT for joint intent classification and slot filling,” CoRR, vol. abs/1902.10909, 2019. [Online]. Available: http://arxiv.org/abs/1902.10909
  9. J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers).   Association for Computational Linguistics, 2019, pp. 4171–4186.
  10. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized BERT pretraining approach,” CoRR, vol. abs/1907.11692, 2019. [Online]. Available: http://arxiv.org/abs/1907.11692
  11. Y. Shen, A. Ray, A. Patel, and H. Jin, “CRUISE: cold-start new skill development via iterative utterance generation,” in Proceedings of ACL 2018, System Demonstrations.   Association for Computational Linguistics, 2018, pp. 105–110.
  12. B. M. Lake, “Compositional generalization through meta sequence-to-sequence learning,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, 2019, pp. 9788–9798.
  13. J. Gordon, D. Lopez-Paz, M. Baroni, and D. Bouchacourt, “Permutation equivariant models for compositional generalization in language,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.   OpenReview.net, 2020.
  14. M. I. Nye, A. Solar-Lezama, J. Tenenbaum, and B. M. Lake, “Learning compositional rules via neural program synthesis,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  15. X. Chen, C. Liang, A. W. Yu, D. Song, and D. Zhou, “Compositional generalization via neural-symbolic stack machines,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  16. D. A. Dahl, M. Bates, M. Brown, W. M. Fisher, K. Hunicke-Smith, D. S. Pallett, C. Pao, A. I. Rudnicky, and E. Shriberg, “Expanding the scope of the ATIS task: The ATIS-3 corpus,” in Human Language Technology, Proceedings of a Workshop held at Plainsboro, New Jerey, USA.   Morgan Kaufmann, 1994.
  17. Y. He and S. J. Young, “Semantic processing using the hidden vector state model,” Comput. Speech Lang., vol. 19, no. 1, pp. 85–106, 2005.
  18. A. Coucke, A. Saade, A. Ball, T. Bluche, A. Caulier, D. Leroy, C. Doumouro, T. Gisselbrecht, F. Caltagirone, T. Lavril, M. Primet, and J. Dureau, “Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces,” CoRR, vol. abs/1805.10190, 2018.
  19. A. Ray, Y. Shen, and H. Jin, “Iterative delexicalization for improved spoken language understanding,” in Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019.   ISCA, 2019, pp. 1183–1187.
  20. Y. Yan, K. He, H. Xu, S. Liu, F. Meng, M. Hu, and W. Xu, “Adversarial semantic decoupling for recognizing open-vocabulary slots,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020.   Association for Computational Linguistics, 2020, pp. 6070–6075.
  21. B. M. Lake and M. Baroni, “Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, ser. Proceedings of Machine Learning Research, vol. 80, 2018, pp. 2879–2888.
  22. L. Ruis, J. Andreas, M. Baroni, D. Bouchacourt, and B. M. Lake, “A benchmark for systematic generalization in grounded language understanding,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 2020.
  23. D. Hupkes, V. Dankers, M. Mul, and E. Bruni, “Compositionality decomposed: How do neural networks generalise?” J. Artif. Intell. Res., vol. 67, pp. 757–795, 2020.
  24. E. Akyürek, A. F. Akyürek, and J. Andreas, “Learning to recombine and resample data for compositional generalization,” in 9th International Conference on Learning Representations, ICLR 2021.   OpenReview.net, 2021.
  25. T. Gao, Q. Huang, and R. J. Mooney, “Systematic generalization on gscan with language conditioned embedding,” in Proceedings of AACL/IJCNLP 2020.   Association for Computational Linguistics, 2020, pp. 491–503.
  26. Z. Zhao, S. Zhu, and K. Yu, “Data augmentation with atomic templates for spoken language understanding,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019.   Association for Computational Linguistics, 2019, pp. 3635–3641.
  27. V. Cirik, T. Berg-Kirkpatrick, and L. Morency, “Using syntax to ground referring expressions in natural images,” in Proc. of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18).   AAAI Press, 2018, pp. 6756–6764.
  28. Y. Kuo, B. Katz, and A. Barbu, “Compositional networks enable systematic generalization for grounded language understanding,” in Findings of the Association for Computational Linguistics: EMNLP 2021.   Association for Computational Linguistics, 2021, pp. 216–226.
  29. Z. Huang, D. Liang, P. Xu, and B. Xiang, “Improve transformer models with better relative position embeddings,” in Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, ser. Findings of ACL, vol. EMNLP 2020, 2020, pp. 3327–3335.
  30. S. Ontañón, J. Ainslie, Z. Fisher, and V. Cvicek, “Making transformers solve compositional tasks,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL.   Association for Computational Linguistics, 2022, pp. 3591–3607.
  31. Y.-Y. Wang, L. Deng, and A. Acero, “Spoken language understanding,” IEEE Signal Processing Magazine, vol. 22, no. 5, pp. 16–31, 2005.
  32. C. Raymond and G. Riccardi, “Generative and discriminative algorithms for spoken language understanding,” in INTERSPEECH 2007, 2007, pp. 1605–1608.
  33. S. Yaman, L. Deng, D. Yu, Y.-Y. Wang, and A. Acero, “An integrative and discriminative technique for spoken utterance classification,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 16, no. 6, pp. 1207–1214, 2008.
  34. G. Tür, D. Z. Hakkani-Tür, D. Hillard, and A. Çelikyilmaz, “Towards unsupervised spoken language understanding: Exploiting query click logs for slot filling,” in Proc. INTERSPEECH 2011, 2011, pp. 1293–1296.
  35. L. Heck and D. Hakkani-Tür, “Exploiting the semantic web for unsupervised spoken language understanding,” in Spoken Language Technology Workshop (SLT), 2012 IEEE.   IEEE, 2012, pp. 228–233.
  36. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017, pp. 5998–6008.
  37. Z. Yang, Z. Dai, Y. Yang, J. G. Carbonell, R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, 2019, pp. 5754–5764.
  38. P. He, X. Liu, J. Gao, and W. Chen, “Deberta: Decoding-enhanced BERT with disentangled attention,” CoRR, vol. abs/2006.03654, 2020. [Online]. Available: https://arxiv.org/abs/2006.03654
  39. P. Shaw, M. Chang, P. Pasupat, and K. Toutanova, “Compositional generalization and natural language variation: Can a semantic parsing approach handle both?” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers).   Association for Computational Linguistics, 2021, pp. 922–938.
  40. J. Herzig and J. Berant, “Span-based semantic parsing for compositional generalization,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021.   Association for Computational Linguistics, 2021, pp. 908–921.
  41. J. Andreas, “Good-enough compositional data augmentation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020.   Association for Computational Linguistics, 2020, pp. 7556–7566.
  42. F. Hill, A. K. Lampinen, R. Schneider, S. Clark, M. Botvinick, J. L. McClelland, and A. Santoro, “Environmental drivers of systematicity and generalization in a situated agent,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.   OpenReview.net, 2020.
  43. D. A. Hudson and C. D. Manning, “GQA: A new dataset for real-world visual reasoning and compositional question answering,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019.   Computer Vision Foundation / IEEE, 2019, pp. 6700–6709.
  44. ——, “Learning by abstraction: The neural state machine,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, 2019, pp. 5901–5914.
  45. R. Saqur and K. Narasimhan, “Multimodal graph networks for compositional generalization in visual question answering,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  46. C. Heinze-Deml and D. Bouchacourt, “Think before you act: A simple baseline for compositional generalization,” ArXiv, vol. abs/2009.13962, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.