Papers
Topics
Authors
Recent
Search
2000 character limit reached

Did You Mean...? Confidence-based Trade-offs in Semantic Parsing

Published 29 Mar 2023 in cs.CL | (2303.16857v3)

Abstract: We illustrate how a calibrated model can help balance common trade-offs in task-oriented parsing. In a simulated annotator-in-the-loop experiment, we show that well-calibrated confidence scores allow us to balance cost with annotator load, improving accuracy with a small number of interactions. We then examine how confidence scores can help optimize the trade-off between usability and safety. We show that confidence-based thresholding can substantially reduce the number of incorrect low-confidence programs executed; however, this comes at a cost to usability. We propose the DidYouMean system which better balances usability and safety.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Shobhit Chaurasia and Raymond Mooney. 2017. Dialog for language to code. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 175–180.
  2. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  3. Conversational semantic parsing for dialog state tracking. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8107–8117, Online. Association for Computational Linguistics.
  4. Chi-Keung Chow. 1957. An optimum character recognition system using decision functions. In IRE Transactions on Electronic Computers, 4, pages 247–254. IEEE.
  5. Mary Cummings. 2004. Automation bias in intelligent time critical decision support systems. AIAA 1st Intelligent Systems Technical Conference.
  6. Ran El-Yaniv et al. 2010. On the foundations of noise-free selective classification. Journal of Machine Learning Research, 11(5).
  7. The whole truth and nothing but the truth: Faithful and controllable dialogue response generation with dataflow transduction and constrained decoding. arXiv preprint arXiv:2209.07800.
  8. Semantic parsing for task oriented dialog using hierarchical representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2787–2792.
  9. Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136.
  10. The curious case of neural text degeneration. In International Conference on Learning Representations.
  11. David D Lewis and William A Gale. 1994. A sequential algorithm for training text classifiers. In SIGIR’94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, organised by Dublin City University, pages 3–12. Springer.
  12. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
  13. Fei Li and Hosagrahar V Jagadish. 2014. Constructing an interactive natural language interface for relational databases. Proceedings of the VLDB Endowment, 8(1):73–84.
  14. Corey Lynch and Pierre Sermanet. 2020. Language conditioned imitation learning over unstructured data. arXiv preprint arXiv:2005.07648.
  15. Interactive language: Talking to robots in real time. arXiv preprint arXiv:2210.06407.
  16. Learning language-conditioned robot behavior from offline data and crowd-sourced annotation. In Conference on Robot Learning, pages 1303–1315. PMLR.
  17. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  18. Value-agnostic conversational semantic parsing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3666–3681, Online. Association for Computational Linguistics.
  19. Benchclamp: A benchmark for evaluating language models on semantic parsing. arXiv preprint arXiv:2206.10668.
  20. Stuart Russell and Peter Norvig. 2010. Artificial intelligence a modern approach. Pearson Education, Inc.
  21. Task-oriented dialogue as dataflow synthesis. Transactions of the Association for Computational Linguistics, 8:556–571.
  22. Guiding multi-step rearrangement tasks with natural language instructions. In 5th Annual Conference on Robot Learning.
  23. When more data hurts: A troubling quirk in developing broad-coverage natural language understanding systems. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  24. Elias Stengel-Eskin and Benjamin Van Durme. 2022. Calibrated interpretation: Confidence estimation in semantic parsing. arXiv preprint arXiv:2211.07443.
  25. Natural language interfaces with fine-grained user interaction: A case study on web apis. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 855–864.
  26. Investigating selective prediction approaches across several tasks in iid, ood, and adversarial settings. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1995–2002.
  27. Reliable visual question answering: Abstain rather than answer incorrectly. In Proceedings of the European Conference on Computer Vision (ECCV).
  28. Terry Winograd. 1972. Understanding natural language. Cognitive psychology, 3(1):1–191.
  29. The art of abstention: Selective prediction and error regularization for natural language processing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1040–1051.
  30. Model-based interactive semantic parsing: A unified framework and a text-to-sql case study. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5447–5458.
  31. AMR parsing as sequence-to-graph transduction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 80–94, Florence, Italy. Association for Computational Linguistics.
  32. Broad-coverage semantic parsing as transduction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3786–3798, Hong Kong, China. Association for Computational Linguistics.
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.