Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GSQA: An End-to-End Model for Generative Spoken Question Answering (2312.09781v4)

Published 15 Dec 2023 in cs.CL and cs.AI

Abstract: In recent advancements in spoken question answering (QA), end-to-end models have made significant strides. However, previous research has primarily focused on extractive span selection. While this extractive-based approach is effective when answers are present directly within the input, it falls short in addressing abstractive questions, where answers are not directly extracted but inferred from the given information. To bridge this gap, we introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning. The challenge in training our GSQA model lies in the absence of a spoken abstractive QA dataset. We propose using text models for initialization and leveraging the extractive QA dataset to transfer knowledge from the text generative model to the spoken generative model. Experimental results indicate that our model surpasses the previous extractive model by 3% on extractive QA datasets. Furthermore, the GSQA model has only been fine-tuned on the spoken extractive QA dataset. Despite not having seen any spoken abstractive QA data, it can still closely match the performance of the cascade model. In conclusion, our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding the spoken question answering capabilities of abstractive QA. Our code is available at https://voidful.github.io/GSQA

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. “Generative question answering: Learning to answer the whole question,” in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.
  2. Siamak Shakeri et al., “End-to-end synthetic data generation for domain adaptation of question answering systems,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020.
  3. Zhenrui Yue et al., “Contrastive domain adaptation for question answering using limited text corpora,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021.
  4. Zhenrui Yue et al., “Domain adaptation for question answering via question classification,” in Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022.
  5. Guan-Ting Lin et al., “DUAL: discrete spoken unit adaptive learning for textless spoken question answering,” in Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022.
  6. Hirofumi Inaguma et al., “Unity: Two-pass direct speech-to-speech translation with discrete units,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, Toronto, Canada, July 9-14, 2023.
  7. Ann Lee et al., “Textless speech-to-speech translation on real data,” in NAACL 2022, Seattle, WA, United States, July 10-15, 2022.
  8. Wei-Ning Hsu et al., “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” IEEE ACM Trans. Audio Speech Lang. Process., pp. 3451–3460, 2021.
  9. Chia-Hsuan Li et al., “Spoken squad: A study of mitigating the impact of speech recognition errors on listening comprehension,” in Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018.
  10. “Mitigating the impact of speech recognition errors on spoken question answering by adversarial domain adaptation,” in ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019.
  11. Chenyu You et al., “Knowledge distillation for improved accuracy in spoken question answering,” in ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021.
  12. Alexei Baevski et al., “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in NeurIPS 2020, December 6-12, 2020.
  13. Alec Radford et al., “Robust speech recognition via large-scale weak supervision,” in International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, Proceedings of Machine Learning Research.
  14. Michael Hassid et al., “Textually pretrained speech language models,” 2023.
  15. Kushal Lakhotia et al., “Generative spoken language modeling from raw audio,” 2021.
  16. “Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  17. Mike Lewis et al., “BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020.
  18. Mandy Guo et al., “Longt5: Efficient text-to-text transformer for long sequences,” in Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022.
  19. Hugo Touvron et al., “Llama: Open and efficient foundation language models,” 2023.
  20. Hugo Touvron et al., “Llama 2: Open foundation and fine-tuned chat models,” 2023.
  21. “Direct speech-to-speech translation with discrete units,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, May 2022, pp. 3327–3339, Association for Computational Linguistics.
  22. “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.
  23. Chin-Yew Lin, “ROUGE: A package for automatic evaluation of summaries,” in Text Summarization Branches Out, Barcelona, Spain, July 2004, pp. 74–81, Association for Computational Linguistics.
  24. “Longformer: The long-document transformer,” 2020.
  25. “Know what you don’t know: Unanswerable questions for squad,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018.
  26. Alex Wang et al., “Superglue: A stickier benchmark for general-purpose language understanding systems,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada.
  27. Adam Trischler et al., “Newsqa: A machine comprehension dataset,” in Proceedings of the 2nd Workshop on Representation Learning for NLP, Rep4NLP@ACL 2017, Vancouver, Canada, August 3, 2017.
  28. Dheeru Dua et al., “DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019.
  29. Tomás Kociský et al., “The narrativeqa reading comprehension challenge,” Trans. Assoc. Comput. Linguistics.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com