Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Are Human Conversations Special? A Large Language Model Perspective (2403.05045v1)

Published 8 Mar 2024 in cs.CL, cs.AI, and cs.LG

Abstract: This study analyzes changes in the attention mechanisms of LLMs when used to understand natural conversations between humans (human-human). We analyze three use cases of LLMs: interactions over web content, code, and mathematical texts. By analyzing attention distance, dispersion, and interdependency across these domains, we highlight the unique challenges posed by conversational data. Notably, conversations require nuanced handling of long-term contextual relationships and exhibit higher complexity through their attention patterns. Our findings reveal that while LLMs exhibit domain-specific attention behaviors, there is a significant gap in their ability to specialize in human conversations. Through detailed attention entropy analysis and t-SNE visualizations, we demonstrate the need for models trained with a diverse array of high-quality conversational data to enhance understanding and generation of human-like dialogue. This research highlights the importance of domain specialization in LLMs and suggests pathways for future advancement in modeling human conversational nuances.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Attention Is All You Need. Advances in Neural Information Processing Systems, 30, 2017.
  2. Large language models: A survey. arXiv preprint 2402.06196, 2024.
  3. Code llama: Open foundation models for code, 2024.
  4. Starcoder: may the source be with you!, 2023.
  5. Llemma: An open language model for mathematics. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=4WnqRR915j.
  6. Bloomberggpt: A large language model for finance, 2023.
  7. Biomistral: A collection of open-source pretrained large language models for medical domains, 2024.
  8. Large language models in healthcare and medical domain: A review, 2023.
  9. Human/human conversation understanding. Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, pages 225–255, 2011.
  10. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023a.
  11. Lana Rings. Authentic language and authentic conversational texts. Foreign Language Annals, 19(3):203–208, 1986.
  12. Francesca Pridham. The language of conversation. Routledge, 2013.
  13. Global-locally self-attentive encoder for dialogue state tracking. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1458–1467, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1135. URL https://aclanthology.org/P18-1135.
  14. Transferable multi-domain state generator for task-oriented dialogue systems. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 808–819, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1078. URL https://aclanthology.org/P19-1078.
  15. Dialogue state tracking with a language model using schema-driven prompting. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4937–4949, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.404. URL https://aclanthology.org/2021.emnlp-main.404.
  16. In-context learning for few-shot dialogue state tracking. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2627–2643, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.findings-emnlp.193.
  17. Enhanced speaker-aware multi-party multi-turn dialogue comprehension. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:2410–2423, 2023. doi: 10.1109/TASLP.2023.3284516.
  18. Air-act2act: Human–human interaction dataset for teaching non-verbal social behaviors to robots. The International Journal of Robotics Research, 40(4-5):691–697, 2021. doi: 10.1177/0278364921990671. URL https://doi.org/10.1177/0278364921990671.
  19. Susan E Brennan. Conversation with and through computers. User modeling and user-adapted interaction, 1:67–86, 1991.
  20. Human-human task-oriented conversations corpus for interaction quality modeling. Siberian Aerospace Journal, 17(1):84–90, 2016. ISSN 2712-8970. URL https://journals.eco-vector.com/2712-8970/article/view/504721.
  21. Common Crawl. Commoncrawl - get started. https://commoncrawl.org/get-started, 2023.
  22. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018. URL http://arxiv.org/abs/1810.04805.
  23. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, 2019.
  24. An analysis of encoder representations in transformer-based machine translation. In Tal Linzen, Grzegorz Chrupała, and Afra Alishahi, editors, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 287–297, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-5431. URL https://aclanthology.org/W18-5431.
  25. Linguistic knowledge and transferability of contextual representations. arXiv preprint arXiv:1903.08855, 2019.
  26. Analyzing the structure of attention in a transformer language model. arXiv preprint arXiv:1906.04284, 2019.
  27. The state and fate of linguistic diversity and inclusion in the nlp world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6282–6293, 2020.
  28. How good is your tokenizer? on the monolingual performance of multilingual language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3118–3135, 2021.
  29. Codeparrot. Codeparrot - github code dataset. https://huggingface.co/datasets/codeparrot/github-code, 2022.
  30. Llemma: An open language model for mathematics, 2023.
  31. Llama: Open and efficient foundation language models, 2023b.
  32. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  33. What does bert learn about the structure of language? In ACL 2019-57th Annual Meeting of the Association for Computational Linguistics, 2019.
  34. Self-attention attribution: Interpreting information interactions inside transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12963–12971, 2021.

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com