Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Memory-Augmented Generative Adversarial Transformers (2402.19218v1)

Published 29 Feb 2024 in cs.CL

Abstract: Conversational AI systems that rely on LLMs, like Transformers, have difficulty interweaving external data (like facts) with the language they generate. Vanilla Transformer architectures are not designed for answering factual questions with high accuracy. This paper investigates a possible route for addressing this problem. We propose to extend the standard Transformer architecture with an additional memory bank holding extra information (such as facts drawn from a knowledge base), and an extra attention layer for addressing this memory. We add this augmented memory to a Generative Adversarial Network-inspired Transformer architecture. This setup allows for implementing arbitrary felicity conditions on the generated language of the Transformer. We first demonstrate how this machinery can be deployed for handling factual questions in goal-oriented dialogues. Secondly, we demonstrate that our approach can be useful for applications like {\it style adaptation} as well: the adaptation of utterances according to certain stylistic (external) constraints, like social properties of human interlocutors in dialogues.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, page 5998–6008. Curran Associates, Inc., 2017.
  2. Le Scao et al. Bloom: A 176b-parameter open-access multilingual language model. In BigScience Workshop et al., volume abs/2211.05100, 2023.
  3. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67, 2020.
  4. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
  5. Training language models to follow instructions with human feedback, 2022. cite arXiv:2203.02155.
  6. Too informal? How a chatbot’s communication style affects brand attitude and quality of interaction. In Asbjørn Følstad, Theo Araujo, Symeon Papadopoulos, Effie L.-C. Law, Ewa Luger, Morten Goodwin, and Petter Bae Brandtzaeg, editors, Chatbot Research and Design, pages 16–31, Cham, 2021. Springer International Publishing.
  7. Factuality enhanced language models for open-ended text generation. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 34586–34599. Curran Associates, Inc., 2022.
  8. Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12), mar 2023.
  9. Generative adversarial networks. arXiv, 2014. cite arXiv:1406.2661.
  10. Conditional generative adversarial nets. arXiv, abs/1411.1784, 2014.
  11. A knowledge-grounded neural conversation model. In Sheila A. McIlraith and Kilian Q. Weinberger, editors, AAAI, pages 5110–5117. AAAI Press, 2018.
  12. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv, abs/2302.12813, 2023.
  13. Generalization through Memorization: Nearest Neighbor Language Models. In International Conference on Learning Representations (ICLR), 2020.
  14. Forgetting exceptions is harmful in language learning. Mach. Learn., 34(1-3):11–41, 1999.
  15. Relational Memory-Augmented Language Models. Transactions of the Association for Computational Linguistics, 10:555–572, 05 2022.
  16. Retrieval-augmented generation for knowledge-intensive nlp tasks.
  17. Entities as experts: Sparse memory access with entity supervision, 2020.
  18. Adaptable and interpretable neural MemoryOver symbolic knowledge. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3678–3691, Online, June 2021. Association for Computational Linguistics.
  19. Training language models with memory augmentation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5657–5673, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics.
  20. Mention memory: incorporating textual knowledge into transformers through entity mention attention. arXiv, abs/2110.06176, 2021.
  21. Reflexion: Language agents with verbal reinforcement learning, 2023.
  22. In-context retrieval-augmented language models. arXiv, abs/2302.00083, 2023.
  23. Replug: Retrieval-augmented black-box language models, 2023.
  24. Rethinking the role of demonstrations: What makes in-context learning work? arXiv, abs/2202.12837, 2022.
  25. Large language models are human-level prompt engineers. arXiv, abs/2211.01910, 2023.
  26. Generative adversarial transformers. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 4487–4499. PMLR, 18–24 Jul 2021.
  27. Style example-guided text generation using generative adversarial transformers. arXiv, abs/2003.00674, 2020.
  28. Key-value retrieval networks for task-oriented dialogue. In Kristiina Jokinen, Manfred Stede, David DeVault, and Annie Louis, editors, SIGDIAL Conference, pages 37–49. Association for Computational Linguistics, 2017.
  29. Personalization in goal-oriented dialog. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA., 2017.
  30. Learning end-to-end goal-oriented dialog. In ICLR 2017, 2017.
  31. Matt Post. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Belgium, Brussels, October 2018. Association for Computational Linguistics.
  32. Maja Popović. chrF++: words helping character n-grams. In Proceedings of the Second Conference on Machine Translation, pages 612–618, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.
  33. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 223–231, Cambridge, Massachusetts, USA, August 8-12 2006. Association for Machine Translation in the Americas.
  34. Relevance of unsupervised metrics in task-oriented dialogue for evaluating natural language generation. CoRR, abs/1706.09799, 2017.
  35. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics.
  36. Skip-thought vectors. Advances in neural information processing systems, 28, 2015.
  37. Google. Evaluating models. https://cloud.google.com/translate/automl/docs/evaluate, 2023. [Online; accessed 25-April-2023].
  38. Joint knowledge graph completion and question answering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1098–1108, 2022.
  39. Chatgpt is not enough: Enhancing large language models with knowledge graphs for fact-aware language modeling. arXiv, abs/2306.11489, 2023.
  40. Neo4j. Neo4j - the world’s leading graph database, 2012.
  41. Research on medical question answering system based on knowledge graph. IEEE Access, 9:21094–21101, 2021.
  42. Deep reinforcement learning from human preferences. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  43. H.H. Clark. Using language. Cambridge University Press, Cambridge, 1996.
  44. Herbert H. Clark and Ed Schaefer. Contributing to discourse. Cogn. Sci., 13:259–294, 1989.
  45. Grounding in communication. In Perspectives on socially shared cognition, 1991.
  46. Emanuel A. Schegloff. Discourse as an interactional achievement: some uses of ’uh huh’ and other things that come between sentences. In Deborah Tannen, editor, Analyzing Discourse: Text and Talk, page 71–93. Georgetown University Press, Washington, D.C., 1982.
  47. Harry Bunt. Multifunctionality in dialogue. Computer Speech & Language, 25:222–245, 04 2011.
  48. Opening up closings. Semiotica, 8(4):289–327, 1973.

Summary

We haven't generated a summary for this paper yet.