Evaluating, Understanding, and Improving Constrained Text Generation for Large Language Models (2310.16343v2)
Abstract: Advancements in natural language generation (NLG) and LLMs have led to proficient text generation in various tasks. However, integrating intricate constraints into neural text generation, due to LLMs' opacity, remains challenging. This study investigates constrained text generation for LLMs, where predefined constraints are applied during LLM's generation process. Our research mainly focuses on mainstream open-source LLMs, categorizing constraints into lexical, structural, and relation-based types. We also present various benchmarks to facilitate fair evaluation. The study addresses some key research questions, including evaluating, understanding and improving constrained text generation for LLMs. Results illuminate LLMs' capacity and deficiency to incorporate constraints and provide insights for future developments in constrained text generation. Codes and datasets will be released upon acceptance.
- Falcon-40B: an open large language model with state-of-the-art performance.
- Spice: Semantic propositional image caption evaluation. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, pages 382–398. Springer.
- Guided open vocabulary image captioning with constrained beam search. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936–945, Copenhagen, Denmark. Association for Computational Linguistics.
- Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan. Association for Computational Linguistics.
- NEUROSTRUCTURAL DECODING: Neural text generation with structural constraints. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9496–9510, Toronto, Canada. Association for Computational Linguistics.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Lexical-constraint-aware neural machine translation via data augmentation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 3587–3593. ijcai.org.
- Relation-constrained decoding for text generation. In Advances in Neural Information Processing Systems, volume 35, pages 26804–26819. Curran Associates, Inc.
- Benchmarking large language models on controllable generation under diversified instructions. arXiv preprint arXiv:2401.00690.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Training neural machine translation to apply terminology constraints. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3063–3068, Florence, Italy. Association for Computational Linguistics.
- Xingwei He and Victor OK Li. 2021. Show me how to revise: Improving lexically constrained sentence generation with xlnet. In Proceedings of AAAI, pages 12989–12997.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
- John Hewitt and Christopher D Manning. 2019. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138.
- Chris Hokamp and Qun Liu. 2017. Lexically constrained decoding for sequence generation using grid beam search. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546, Vancouver, Canada. Association for Computational Linguistics.
- Improved lexically constrained decoding for translation and monolingual rewriting. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 839–850, Minneapolis, Minnesota. Association for Computational Linguistics.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
- Quantifying adaptability in pre-trained language models with 500 tasks. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4696–4715.
- Benchmarking and improving generator-validator consistency of language models. arXiv preprint arXiv:2310.01846.
- Commongen: A constrained text generation challenge for generative commonsense reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1823–1840.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958.
- Bounding the capabilities of large language models in open text generation with prompt constraints. arXiv preprint arXiv:2302.09185.
- NeuroLogic a*esque decoding: Constrained text generation with lookahead heuristics. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 780–799, Seattle, United States. Association for Computational Linguistics.
- NeuroLogic decoding: (un)supervised neural text generation with predicate logic constraints. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4288–4299, Online. Association for Computational Linguistics.
- Extract, denoise and enforce: Evaluating and improving concept preservation for text-to-text generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5063–5074, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- CGMH: constrained sentence generation by metropolis-hastings sampling. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 6834–6842. AAAI Press.
- Are sixteen heads really better than one? In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
- A corpus and cloze evaluation for deeper understanding of commonsense stories. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 839–849.
- Gpt-4 technical report.
- OpenAI. 2022. Introducing chatgpt.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
- Matt Post and David Vilar. 2018. Fast lexically constrained decoding with dynamic beam allocation for neural machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1314–1324, New Orleans, Louisiana. Association for Computational Linguistics.
- Lei Sha. 2020. Gradient-guided unsupervised lexically constrained text generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8692–8703, Online. Association for Computational Linguistics.
- A gold standard dependency corpus for English. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 2897–2904, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.
- Alignment-enhanced transformer for constraining NMT with pre-specified translations. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 8886–8893. AAAI Press.
- Code-switching for enhancing NMT with pre-specified translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 449–459, Minneapolis, Minnesota. Association for Computational Linguistics.
- Evaluating large language models on controlled generation tasks. arXiv preprint arXiv:2310.14542.
- What do you learn from context? probing for sentence structure in contextualized word representations. In International Conference on Learning Representations.
- Llama 2: Open foundation and fine-tuned chat models.
- Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4566–4575.
- Probing pretrained language models for lexical semantics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7222–7240.
- Label words are anchors: An information flow perspective for understanding in-context learning. arXiv preprint arXiv:2305.14160.
- Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization. arXiv preprint arXiv:2306.05087.
- Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109.
- Mention flags (MF): Constraining transformer-based text generators. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 103–113, Online. Association for Computational Linguistics.
- Neural rule-execution tracking machine for transformer-based text generation. Advances in Neural Information Processing Systems, 34.
- Collie: Systematic construction of constrained text generation tasks. arXiv preprint arXiv:2307.08689.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685.
- Instruction-following evaluation for large language models. arXiv preprint arXiv:2311.07911.
- Controlled text generation with natural language instructions. arXiv preprint arXiv:2304.14293.
- Xiang Chen (343 papers)
- Xiaojun Wan (99 papers)