AmbigNLG: Addressing Task Ambiguity in Instruction for NLG (2402.17717v4)
Abstract: We introduce AmbigNLG, a novel task designed to tackle the challenge of task ambiguity in instructions for Natural Language Generation (NLG). Ambiguous instructions often impede the performance of LLMs, especially in complex NLG tasks. To tackle this issue, we propose an ambiguity taxonomy that categorizes different types of instruction ambiguities and refines initial instructions with clearer specifications. Accompanying this task, we present AmbigSNI-NLG, a dataset comprising 2,500 instances annotated to facilitate research in AmbigNLG. Through comprehensive experiments with state-of-the-art LLMs, we demonstrate that our method significantly enhances the alignment of generated text with user expectations, achieving up to a 15.02-point increase in ROUGE scores. Our findings highlight the critical importance of addressing task ambiguity to fully harness the capabilities of LLMs in NLG tasks. Furthermore, we confirm the effectiveness of our method in practical settings involving interactive ambiguity mitigation with users, underscoring the benefits of leveraging LLMs for interactive clarification.
- Regina Barzilay and Mirella Lapata. 2005. Modeling local coherence: An entity-based approach. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 141–148, Ann Arbor, Michigan. Association for Computational Linguistics.
- Benchmarking and improving text-to-SQL generation under ambiguity. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7053–7074, Singapore. Association for Computational Linguistics.
- Steven Bird. 2006. NLTK: The Natural Language Toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pages 69–72, Sydney, Australia. Association for Computational Linguistics.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- Yake! keyword extraction from single documents using multiple local features. Information Sciences, 509:257–289.
- Marine Carpuat and Dekai Wu. 2007. Improving statistical machine translation using word sense disambiguation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 61–72, Prague, Czech Republic. Association for Computational Linguistics.
- Batch prompting: Efficient inference with large language model APIs. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 792–810, Singapore. Association for Computational Linguistics.
- Jacob Cohen. 1969. Statistical power analysis for the behavioral sciences. Academic press.
- Flashattention: Fast and memory-efficient exact attention with IO-awareness. In Advances in Neural Information Processing Systems.
- Boosting natural language generation from instructions with meta-learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6792–6808, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Is GPT-3 a good data annotator? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11173–11195, Toronto, Canada. Association for Computational Linguistics.
- Enabling language models to fill in the blanks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2492–2501, Online. Association for Computational Linguistics.
- Controllable abstractive summarization. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 45–54, Melbourne, Australia. Association for Computational Linguistics.
- Probabilistic model-agnostic meta-learning. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.
- Chatgpt outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30):e2305016120.
- Measuring massive multitask language understanding. In International Conference on Learning Representations.
- Hayate Iso. 2022. Autotemplate: A simple recipe for lexically constrained text generation.
- Fact-based Text Editing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 171–182, Online. Association for Computational Linguistics.
- Comparative opinion summarization via collaborative decoding. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3307–3324, Dublin, Ireland. Association for Computational Linguistics.
- Mistral 7b.
- Mixtral of experts.
- Daniel Jurafsky. 1996. A probabilistic model of lexical and syntactic access and disambiguation. Cognitive science, 20(2):137–194.
- Ken Kelley and Kristopher J Preacher. 2012. On effect size. Psychological methods, 17(2):137.
- Karen Kukich. 1983. Design of a knowledge-based report generator. In 21st Annual Meeting of the Association for Computational Linguistics, pages 145–150, Cambridge, Massachusetts, USA. Association for Computational Linguistics.
- Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP ’23, page 611–626, New York, NY, USA. Association for Computing Machinery.
- Guiding large language models via directional stimulus prompting. In Thirty-seventh Conference on Neural Information Processing Systems.
- We’re afraid language models aren’t modeling ambiguity. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 790–807, Singapore. Association for Computational Linguistics.
- G-eval: NLG evaluation using gpt-4 with better human alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2511–2522, Singapore. Association for Computational Linguistics.
- On improving summarization factual consistency from natural language feedback. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15144–15161, Toronto, Canada. Association for Computational Linguistics.
- David D. McDonald and James D. Pustejovsky. 1985. A computational theory of prose style for natural language generation. In Second Conference of the European Chapter of the Association for Computational Linguistics, Geneva, Switzerland. Association for Computational Linguistics.
- AmbigQA: Answering ambiguous open-domain questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5783–5797, Online. Association for Computational Linguistics.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
- Interactive-chain-prompting: Ambiguity resolution for crosslingual conditional generation with interaction. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 455–483, Nusa Dua, Bali. Association for Computational Linguistics.
- Efficiently scaling transformer inference. Proceedings of Machine Learning and Systems, 5.
- Automatic prompt optimization with “gradient descent” and beam search. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7957–7968, Singapore. Association for Computational Linguistics.
- Direct preference optimization: Your language model is secretly a reward model. In Thirty-seventh Conference on Neural Information Processing Systems.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
- Ehud Reiter and Robert Dale. 1997. Building applied natural language generation systems. Natural Language Engineering, 3(1):57–87.
- Computational modelling of structural priming in dialogue. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pages 121–124, New York City, USA. Association for Computational Linguistics.
- Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2655–2671, Seattle, United States. Association for Computational Linguistics.
- Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations.
- Mixture models for diverse machine translation: Tricks of the trade. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 5719–5728. PMLR.
- Classification of imbalanced data: A review. International journal of pattern recognition and artificial intelligence, 23(04):687–719.
- Task ambiguity in humans and language models. In The Eleventh International Conference on Learning Representations.
- Active learning helps pretrained models learn the intended task. In Advances in Neural Information Processing Systems, volume 35, pages 28140–28153. Curran Associates, Inc.
- Llama 2: Open foundation and fine-tuned chat models.
- Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks.
- Learning to retrieve in-context examples for large language models. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics.
- Promptagent: Strategic planning with language models enables expert-level prompt optimization. In The Twelfth International Conference on Learning Representations.
- Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
- Less is More for Long Document Summary Evaluation by LLMs. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics.
- Xatu: A fine-grained instruction-based benchmark for explainable text updates. arXiv.
- Judging LLM-as-a-judge with MT-bench and chatbot arena. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Instruction-following evaluation for large language models.
- Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations.