Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BootTOD: Bootstrap Task-oriented Dialogue Representations by Aligning Diverse Responses (2403.01163v1)

Published 2 Mar 2024 in cs.CL

Abstract: Pre-trained LLMs have been successful in many scenarios. However, their usefulness in task-oriented dialogues is limited due to the intrinsic linguistic differences between general text and task-oriented dialogues. Current task-oriented dialogue pre-training methods rely on a contrastive framework, which faces challenges such as selecting true positives and hard negatives, as well as lacking diversity. In this paper, we propose a novel dialogue pre-training model called BootTOD. It learns task-oriented dialogue representations via a self-bootstrapping framework. Unlike contrastive counterparts, BootTOD aligns context and context+response representations and dismisses the requirements of contrastive pairs. BootTOD also uses multiple appropriate response targets to model the intrinsic one-to-many diversity of human conversations. Experimental results show that BootTOD outperforms strong TOD baselines on diverse downstream dialogue tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, Massachusetts.
  2. Alfred V. Aho and Jeffrey D. Ullman. 1972. The Theory of Parsing, Translation and Compiling, volume 1. Prentice-Hall, Englewood Cliffs, NJ.
  3. American Psychological Association. 1983. Publications Manual. American Psychological Association, Washington, DC.
  4. Rie Kubota Ando and Tong Zhang. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817–1853.
  5. Galen Andrew and Jianfeng Gao. 2007. Scalable training of L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized log-linear models. In Proceedings of the 24th International Conference on Machine Learning, pages 33–40.
  6. A theoretical analysis of contrastive unsupervised representation learning. In ICML.
  7. Frames: a corpus for adding memory to goal-oriented dialogue systems. ArXiv, abs/1704.00057.
  8. data2vec: A general framework for self-supervised learning in speech, vision and language. In ICML.
  9. Plato: Pre-trained dialogue generation model with discrete latent variable. In Annual Meeting of the Association for Computational Linguistics.
  10. Visual information extraction with Lixto. In Proceedings of the 27th International Conference on Very Large Databases, pages 119–128, Rome, Italy. Morgan Kaufmann.
  11. Ronald J. Brachman and James G. Schmolze. 1985. An overview of the KL-ONE knowledge representation system. Cognitive Science, 9(2):171–216.
  12. BSI. 1973a. Natural Fibre Twines, 3rd edition. British Standards Institution, London. BS 2570.
  13. BSI. 1973b. Natural fibre twines. BS 2570, British Standards Institution, London. 3rd. edn.
  14. Multiwoz - a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In EMNLP.
  15. Taskmaster-1: Toward a realistic and diverse dialog dataset. In EMNLP.
  16. The use of user modelling to guide inference and learning. Applied Intelligence, 2(1):37–53.
  17. Alternation. Journal of the Association for Computing Machinery, 28(1):114–133.
  18. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR.
  19. Incremental false negative detection for contrastive learning. ArXiv, abs/2106.03719.
  20. Xinlei Chen and Kaiming He. 2020. Exploring simple siamese representation learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15745–15753.
  21. J.L. Chercheur. 1994. Case-Based Reasoning, 2nd edition. Morgan Kaufman Publishers, San Mateo, CA.
  22. N. Chomsky. 1973. Conditions on transformations. In A festschrift for Morris Halle, New York. Holt, Rinehart & Winston.
  23. James W. Cooley and John W. Tukey. 1965. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 19(90):297–301.
  24. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  25. Peco: Perceptual codebook for bert pre-training of vision transformers. ArXiv, abs/2111.12710.
  26. Umberto Eco. 1990. The Limits of Interpretation. Indian University Press.
  27. Key-value retrieval networks for task-oriented dialogue. ArXiv, abs/1705.05414.
  28. Cert: Contrastive self-supervised learning for language understanding. ArXiv, abs/2005.12766.
  29. SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  30. Georg Gottlob. 1992. Complexity results for nonmonotonic logics. Journal of Logic and Computation, 2(3):397–425.
  31. Hypertree decompositions and tractable queries. Journal of Computer and System Sciences, 64(3):579–627.
  32. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284.
  33. Dan Gusfield. 1997. Algorithms on Strings, Trees and Sequences. Cambridge University Press, Cambridge, UK.
  34. Masked autoencoders are scalable vision learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15979–15988.
  35. Momentum contrast for unsupervised visual representation learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9726–9735.
  36. Space-2: Tree-structured semi-supervised contrastive pre-training for task-oriented dialog understanding. In COLING.
  37. Convert: Efficient and accurate conversational representations from transformers. ArXiv, abs/1911.03688.
  38. The second dialog state tracking challenge. In SIGDIAL Conference.
  39. Training neural response selection for task-oriented dialogue systems. ArXiv, abs/1906.01543.
  40. Paul Gerhard Hoel. 1971a. Elementary Statistics, 3rd edition. Wiley series in probability and mathematical statistics. Wiley, New York, Chichester. ISBN 0 471 40300.
  41. Paul Gerhard Hoel. 1971b. Elementary Statistics, 3rd edition, Wiley series in probability and mathematical statistics, pages 19–33. Wiley, New York, Chichester. ISBN 0 471 40300.
  42. Boosting contrastive self-supervised learning with false negative cancellation. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 986–996.
  43. IJCAI Proceedings. IJCAI camera ready submission. https://proceedings.ijcai.org/info.
  44. Otto Jespersen. 1922. Language: Its Nature, Development, and Origin. Allen and Unwin.
  45. An evaluation dataset for intent classification and out-of-scope prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1311–1316, Hong Kong, China. Association for Computational Linguistics.
  46. Hector J. Levesque. 1984a. Foundations of a functional approach to knowledge representation. Artificial Intelligence, 23(2):155–212.
  47. Hector J. Levesque. 1984b. A logic of implicit and explicit belief. In Proceedings of the Fourth National Conference on Artificial Intelligence, pages 198–202, Austin, Texas. American Association for Artificial Intelligence.
  48. Microsoft dialogue challenge: Building end-to-end task-completion dialogue systems. ArXiv, abs/1807.11125.
  49. Dialoguecse: Dialogue-based contrastive learning of sentence embeddings. ArXiv, abs/2109.12599.
  50. Exploring target representations for masked autoencoders. ArXiv, abs/2209.03917.
  51. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692.
  52. Dialoglue: A natural language understanding benchmark for task-oriented dialogue. ArXiv, abs/2009.13570.
  53. Neural belief tracker: Data-driven dialogue state tracking. In ACL.
  54. Bernhard Nebel. 2000. On the compilability and expressive power of propositional planning formalisms. Journal of Artificial Intelligence Research, 12:271–315.
  55. Towards empathetic open-domain conversation models: A new benchmark and dataset. In ACL.
  56. Mohammad Sadegh Rasooli and Joel R. Tetreault. 2015. Yara parser: A fast and accurate dependency parser. Computing Research Repository, arXiv:1503.06733. Version 2.
  57. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In AAAI.
  58. A network-based end-to-end trainable task-oriented dialogue system. In EACL.
  59. A history of technology. Oxford University Press, London. 5 vol.
  60. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15:1929–1958.
  61. Jannik Strötgen and Michael Gertz. 2012. Temporal tagging on different domains: Challenges, strategies, and gold standards. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pages 3746–3753, Istanbul, Turkey. European Language Resource Association (ELRA).
  62. Superheroes experiences with books, 20th edition. The Phantom Editors Associates, Gotham City.
  63. Jesper E. van Engelen and Holger H. Hoos. 2019. A survey on semi-supervised learning. Machine Learning, 109:373–440.
  64. Feng Wang and Huaping Liu. 2021. Understanding the behaviour of contrastive loss. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2495–2504.
  65. Dialogue natural language inference. In ACL.
  66. A broad-coverage challenge corpus for sentence understanding through inference. In NAACL.
  67. Transfertransfo: A transfer learning approach for neural network based conversational agents. ArXiv, abs/1901.08149.
  68. Tod-bert: Pre-trained natural language understanding for task-oriented dialogue. In EMNLP.
  69. FutureTOD: Teaching future knowledge to pre-trained language model for task-oriented dialogue. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6532–6546, Toronto, Canada. Association for Computational Linguistics.
  70. Semi-supervised knowledge-grounded pre-training for task-oriented dialog systems. In Proceedings of the Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems (SereTOD), pages 39–47, Abu Dhabi, Beijing (Hybrid). Association for Computational Linguistics.
  71. Dialogpt : Large-scale generative pre-training for conversational response generation. In ACL.
  72. Learning dialogue representations from consecutive utterances. In NAACL.
  73. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. 2015 IEEE International Conference on Computer Vision (ICCV), pages 19–27.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Weihao Zeng (24 papers)
  2. Keqing He (47 papers)
  3. Yejie Wang (15 papers)
  4. Dayuan Fu (13 papers)
  5. Weiran Xu (58 papers)