Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AmbigNLG: Addressing Task Ambiguity in Instruction for NLG (2402.17717v4)

Published 27 Feb 2024 in cs.CL

Abstract: We introduce AmbigNLG, a novel task designed to tackle the challenge of task ambiguity in instructions for Natural Language Generation (NLG). Ambiguous instructions often impede the performance of LLMs, especially in complex NLG tasks. To tackle this issue, we propose an ambiguity taxonomy that categorizes different types of instruction ambiguities and refines initial instructions with clearer specifications. Accompanying this task, we present AmbigSNI-NLG, a dataset comprising 2,500 instances annotated to facilitate research in AmbigNLG. Through comprehensive experiments with state-of-the-art LLMs, we demonstrate that our method significantly enhances the alignment of generated text with user expectations, achieving up to a 15.02-point increase in ROUGE scores. Our findings highlight the critical importance of addressing task ambiguity to fully harness the capabilities of LLMs in NLG tasks. Furthermore, we confirm the effectiveness of our method in practical settings involving interactive ambiguity mitigation with users, underscoring the benefits of leveraging LLMs for interactive clarification.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Regina Barzilay and Mirella Lapata. 2005. Modeling local coherence: An entity-based approach. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 141–148, Ann Arbor, Michigan. Association for Computational Linguistics.
  2. Benchmarking and improving text-to-SQL generation under ambiguity. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7053–7074, Singapore. Association for Computational Linguistics.
  3. Steven Bird. 2006. NLTK: The Natural Language Toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pages 69–72, Sydney, Australia. Association for Computational Linguistics.
  4. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  5. Yake! keyword extraction from single documents using multiple local features. Information Sciences, 509:257–289.
  6. Marine Carpuat and Dekai Wu. 2007. Improving statistical machine translation using word sense disambiguation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 61–72, Prague, Czech Republic. Association for Computational Linguistics.
  7. Batch prompting: Efficient inference with large language model APIs. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 792–810, Singapore. Association for Computational Linguistics.
  8. Jacob Cohen. 1969. Statistical power analysis for the behavioral sciences. Academic press.
  9. Flashattention: Fast and memory-efficient exact attention with IO-awareness. In Advances in Neural Information Processing Systems.
  10. Boosting natural language generation from instructions with meta-learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6792–6808, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  11. Is GPT-3 a good data annotator? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11173–11195, Toronto, Canada. Association for Computational Linguistics.
  12. Enabling language models to fill in the blanks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2492–2501, Online. Association for Computational Linguistics.
  13. Controllable abstractive summarization. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 45–54, Melbourne, Australia. Association for Computational Linguistics.
  14. Probabilistic model-agnostic meta-learning. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.
  15. Chatgpt outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30):e2305016120.
  16. Measuring massive multitask language understanding. In International Conference on Learning Representations.
  17. Hayate Iso. 2022. Autotemplate: A simple recipe for lexically constrained text generation.
  18. Fact-based Text Editing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 171–182, Online. Association for Computational Linguistics.
  19. Comparative opinion summarization via collaborative decoding. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3307–3324, Dublin, Ireland. Association for Computational Linguistics.
  20. Mistral 7b.
  21. Mixtral of experts.
  22. Daniel Jurafsky. 1996. A probabilistic model of lexical and syntactic access and disambiguation. Cognitive science, 20(2):137–194.
  23. Ken Kelley and Kristopher J Preacher. 2012. On effect size. Psychological methods, 17(2):137.
  24. Karen Kukich. 1983. Design of a knowledge-based report generator. In 21st Annual Meeting of the Association for Computational Linguistics, pages 145–150, Cambridge, Massachusetts, USA. Association for Computational Linguistics.
  25. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP ’23, page 611–626, New York, NY, USA. Association for Computing Machinery.
  26. Guiding large language models via directional stimulus prompting. In Thirty-seventh Conference on Neural Information Processing Systems.
  27. We’re afraid language models aren’t modeling ambiguity. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 790–807, Singapore. Association for Computational Linguistics.
  28. G-eval: NLG evaluation using gpt-4 with better human alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2511–2522, Singapore. Association for Computational Linguistics.
  29. On improving summarization factual consistency from natural language feedback. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15144–15161, Toronto, Canada. Association for Computational Linguistics.
  30. David D. McDonald and James D. Pustejovsky. 1985. A computational theory of prose style for natural language generation. In Second Conference of the European Chapter of the Association for Computational Linguistics, Geneva, Switzerland. Association for Computational Linguistics.
  31. AmbigQA: Answering ambiguous open-domain questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5783–5797, Online. Association for Computational Linguistics.
  32. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
  33. Interactive-chain-prompting: Ambiguity resolution for crosslingual conditional generation with interaction. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 455–483, Nusa Dua, Bali. Association for Computational Linguistics.
  34. Efficiently scaling transformer inference. Proceedings of Machine Learning and Systems, 5.
  35. Automatic prompt optimization with “gradient descent” and beam search. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7957–7968, Singapore. Association for Computational Linguistics.
  36. Direct preference optimization: Your language model is secretly a reward model. In Thirty-seventh Conference on Neural Information Processing Systems.
  37. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  38. Ehud Reiter and Robert Dale. 1997. Building applied natural language generation systems. Natural Language Engineering, 3(1):57–87.
  39. Computational modelling of structural priming in dialogue. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pages 121–124, New York City, USA. Association for Computational Linguistics.
  40. Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2655–2671, Seattle, United States. Association for Computational Linguistics.
  41. Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations.
  42. Mixture models for diverse machine translation: Tricks of the trade. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 5719–5728. PMLR.
  43. Classification of imbalanced data: A review. International journal of pattern recognition and artificial intelligence, 23(04):687–719.
  44. Task ambiguity in humans and language models. In The Eleventh International Conference on Learning Representations.
  45. Active learning helps pretrained models learn the intended task. In Advances in Neural Information Processing Systems, volume 35, pages 28140–28153. Curran Associates, Inc.
  46. Llama 2: Open foundation and fine-tuned chat models.
  47. Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks.
  48. Learning to retrieve in-context examples for large language models. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics.
  49. Promptagent: Strategic planning with language models enables expert-level prompt optimization. In The Twelfth International Conference on Learning Representations.
  50. Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  51. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
  52. Less is More for Long Document Summary Evaluation by LLMs. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics.
  53. Xatu: A fine-grained instruction-based benchmark for explainable text updates. arXiv.
  54. Judging LLM-as-a-judge with MT-bench and chatbot arena. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  55. Instruction-following evaluation for large language models.
  56. Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations.
Citations (1)

Summary

  • The paper introduces a taxonomy-driven framework to identify and resolve instruction ambiguities in NLG tasks, leading to significant performance improvements.
  • It employs the AmbigSNI_NLG dataset with 2,500 annotated instances to systematically categorize ambiguities across six dimensions: Context, Keywords, Length, Planning, Style, and Theme.
  • Experimental results on both open-source and proprietary LLMs show up to a 15.02-point gain in ROUGE-L F1, emphasizing the practical benefits of clarity in task instructions.

AmbigNLG: Addressing Task Ambiguity in Instruction for NLG

The paper "AmbigNLG: Addressing Task Ambiguity in Instruction for NLG" introduces a method to improve the text generation capabilities of LLMs by mitigating ambiguities in Natural Language Generation (NLG) task instructions. While recent advancements in LLMs have enabled impressive performance across various benchmarks, their efficacy is often hampered by ambiguities in task instructions, which lead to discrepancies between generated outputs and user expectations. The authors propose AmbigNLG, a task framework focused on resolving these ambiguities, which is increasingly crucial given the growing reliance on LLMs for NLG tasks in practical settings.

Central to AmbigNLG is the development of an ambiguity taxonomy for systematically identifying and categorizing instruction ambiguities. This taxonomy consists of six categories: Context, Keywords, Length, Planning, Style, and Theme. The taxonomy serves as a foundation for annotating instructions using the AmbigSNINLG_{NLG} dataset, which comprises 2,500 instances sourced from Super-Natural Instructions. Each instance is annotated with both identified ambiguities and additional instructions aimed at clarifying these ambiguities.

To evaluate their approach, the authors conduct experiments using both open-source LLMs (e.g., LLaMA-2, Mistral, Mixtral) and proprietary models (e.g., GPT-3.5). Their comprehensive analysis demonstrates that reducing ambiguity in instructions leads to significant improvements in text generation quality, with performance increases up to a 15.02-point gain in ROUGE-L F1 scores. This quantifiable enhancement underscores the value of clear and precise task instructions in aligning generation outputs with user expectations.

Beyond addressing the immediate challenge of task ambiguities, the implications of this research extend to the broader field of AI. By alleviating ambiguity, the authors enhance LLM capabilities, potentially enabling their use in more nuanced and sophisticated applications where instruction clarity is pivotal. This could include more accurate dialogue systems, improved instructional content generation, and more reliable human-in-the-loop applications where understanding user intent is crucial.

For future developments, refining and expanding the taxonomy with additional categories or sub-categories of ambiguities could further enhance the robustness of AmbigNLG. Additionally, integrating adaptive learning techniques that allow models to implicitly learn ambiguity resolution strategies may lead to even more seamless interactions.

In conclusion, the introduction of AmbigNLG marks a significant advancement in enhancing the accuracy and reliability of LLMs in NLG contexts. By systematically addressing task ambiguities, this research contributes to the ongoing evolution of AI, supporting more effective and nuanced interpretations of human instructions. The theoretical and practical implications of this work are substantial, suggesting potential pathways for integrating ambiguity mitigation strategies into mainstream LLM deployments and broader AI systems.