Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can Your Model Tell a Negation from an Implicature? Unravelling Challenges With Intent Encoders (2403.04314v1)

Published 7 Mar 2024 in cs.CL

Abstract: Conversational systems often rely on embedding models for intent classification and intent clustering tasks. The advent of LLMs, which enable instructional embeddings allowing one to adjust semantics over the embedding space using prompts, are being viewed as a panacea for these downstream conversational tasks. However, traditional evaluation benchmarks rely solely on task metrics that don't particularly measure gaps related to semantic understanding. Thus, we propose an intent semantic toolkit that gives a more holistic view of intent embedding models by considering three tasks -- (1) intent classification, (2) intent clustering, and (3) a novel triplet task. The triplet task gauges the model's understanding of two semantic concepts paramount in real-world conversational systems -- negation and implicature. We observe that current embedding models fare poorly in semantic understanding of these concepts. To address this, we propose a pre-training approach to improve the embedding model by leveraging augmentation with data generated by an auto-regressive model and a contrastive loss term. Our approach improves the semantic understanding of the intent embedding model on the aforementioned linguistic dimensions while slightly effecting their performance on downstream task metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Falcon-40B: an open large language model with state-of-the-art performance.
  2. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan. Association for Computational Linguistics.
  3. Efficient intent detection with dual sentence encoders. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 38–45, Online. Association for Computational Linguistics.
  4. Low-resource domain adaptation for compositional task-oriented semantic parsing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5090–5100, Online. Association for Computational Linguistics.
  5. What are the best systems? new perspectives on nlp benchmarking. arXiv preprint arXiv:2202.03799.
  6. Alexis Conneau and Douwe Kiela. 2018. SentEval: An evaluation toolkit for universal sentence representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
  7. Robustification of multilingual language models to real-world noise in crosslingual zero-shot settings with robust contrastive pretraining. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1375–1391, Dubrovnik, Croatia. Association for Computational Linguistics.
  8. PROTAUGMENT: Unsupervised diverse short-texts paraphrasing for intent detection meta-learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2454–2466, Online. Association for Computational Linguistics.
  9. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, volume 96, pages 226–231.
  10. Normalized mutual information feature selection. IEEE Transactions on neural networks, 20(2):189–201.
  11. Herbert P Grice. 1975. Logic and conversation. In Speech acts, pages 41–58. Brill.
  12. Intent induction from conversations for task-oriented dialogue track at dstc 11.
  13. Semantic parsing for task oriented dialog using hierarchical representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2787–2792, Brussels, Belgium. Association for Computational Linguistics.
  14. Are natural language inference models IMPPRESsive? Learning IMPlicature and PRESupposition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8690–8705, Online. Association for Computational Linguistics.
  15. An evaluation dataset for intent classification and out-of-scope prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1311–1316, Hong Kong, China. Association for Computational Linguistics.
  16. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  17. Sentbench: Comprehensive evaluation of self-supervised sentence representation with benchmark construction. In Chinese Computational Linguistics, pages 449–463, Singapore. Springer Nature Singapore.
  18. Benchmarking Natural Language Understanding Services for Building Conversational Agents, pages 165–183. Springer Singapore, Singapore.
  19. On the effectiveness of sentence encoding for intent detection meta-learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3806–3818, Seattle, United States. Association for Computational Linguistics.
  20. James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA.
  21. A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 216–223, Reykjavik, Iceland. European Language Resources Association (ELRA).
  22. Label semantic aware pre-training for few-shot text classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8318–8334, Dublin, Ireland. Association for Computational Linguistics.
  23. MTEB: Massive text embedding benchmark. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2014–2037, Dubrovnik, Croatia. Association for Computational Linguistics.
  24. MS MARCO: A human generated machine reading comprehension dataset. CoRR, abs/1611.09268.
  25. Frank Nielsen and Frank Nielsen. 2016. Hierarchical clustering. Introduction to HPC with MPI for Data Science, pages 195–211.
  26. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  27. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8689–8696.
  28. François Recanati. 1989. The pragmatics of what is said.
  29. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  30. Large language models are not zero-shot communicators.
  31. On the robustness of intent classification and slot labeling in goal-oriented dialog systems to real-world noise. In Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, pages 68–79, Online. Association for Computational Linguistics.
  32. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  33. One embedder, any task: Instruction-finetuned text embeddings. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1102–1121, Toronto, Canada. Association for Computational Linguistics.
  34. Pre-training intent-aware encoders for zero- and few-shot intent classification.
  35. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  36. Matching networks for one shot learning. Advances in neural information processing systems, 29.
  37. Stanford encyclopedia of philosophy.
  38. MultiWOZ 2.2 : A dialogue dataset with additional annotation corrections and state tracking baselines. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 109–117, Online. Association for Computational Linguistics.
  39. Fine-tuning pre-trained language models for few-shot intent detection: Supervised pre-training and isotropization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 532–542, Seattle, United States. Association for Computational Linguistics.
  40. Effectiveness of pre-training for few-shot intent classification. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1114–1120, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  41. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
  42. ClusterLLM: Large language models as a guide for text clustering. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13903–13920, Singapore. Association for Computational Linguistics.
  43. New intent discovery with pre-training and contrastive learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 256–269, Dublin, Ireland. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yuwei Zhang (48 papers)
  2. Siffi Singh (7 papers)
  3. Sailik Sengupta (24 papers)
  4. Igor Shalyminov (20 papers)
  5. Hang Su (224 papers)
  6. Hwanjun Song (44 papers)
  7. Saab Mansour (32 papers)
Citations (1)