Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models (2404.02823v1)

Published 3 Apr 2024 in cs.CL, cs.AI, and cs.LG

Abstract: The ability of LLMs to follow instructions is crucial to real-world applications. Despite recent advances, several studies have highlighted that LLMs struggle when faced with challenging instructions, especially those that include complex constraints, hindering their effectiveness in various tasks. To address this challenge, we introduce Conifer, a novel instruction tuning dataset, designed to enhance LLMs to follow multi-level instructions with complex constraints. Utilizing GPT-4, we curate the dataset by a series of LLM-driven refinement processes to ensure high quality. We also propose a progressive learning scheme that emphasizes an easy-to-hard progression, and learning from process feedback. Models trained with Conifer exhibit remarkable improvements in instruction-following abilities, especially for instructions with complex constraints. On several instruction-following benchmarks, our 7B model outperforms the state-of-the-art open-source 7B models, even exceeds the performance of models 10 times larger on certain metrics. All the code and Conifer dataset are available at https://www.github.com/ConiferLM/Conifer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. 2019. Winogrande: An adversarial winograd schema challenge at scale.
  2. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861.
  3. Qwen technical report. arXiv preprint arXiv:2309.16609.
  4. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48. ACM.
  5. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. Published: Blog post.
  6. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. arXiv:1803.05457v1.
  7. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  8. Ultrafeedback: Boosting language models with high-quality feedback. arXiv preprint arXiv:2310.01377.
  9. Plug and play language models: A simple approach to controlled text generation. In International Conference on Learning Representations.
  10. Investigating data contamination in modern benchmarks for large language models. arXiv preprint arXiv:2311.09783.
  11. UltraChat: A Large-scale Auto-generated Multi-round Dialogue Data. Published: GitHub Repository.
  12. Enhancing Chat Language Models by Scaling High-quality Instructional Conversations. arXiv preprint arXiv:2305.14233.
  13. Length-corrected alpacaeval: A simple debiasing of automatic evaluators. https://github.com/tatsu-lab/alpaca_eval.
  14. Tianyu Gao. 2023. Teach llamas to talk: Recent progress in instruction tuning.
  15. Koala: A dialogue model for academic research. Blog post, April, 1.
  16. Aligning AI With Shared Human Values. In ICLR.
  17. Camels in a changing climate: Enhancing lm adaptation with tulu 2.
  18. Mistral 7B. arXiv preprint arXiv:2310.06825.
  19. FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models. arXiv preprint arXiv:2310.20410.
  20. OpenAssistant Conversations - Democratizing Large Language Model Alignment. ArXiv, abs/2304.07327.
  21. AlpacaEval: An Automatic Evaluator of Instruction-following Models. Publication Title: GitHub repository.
  22. Let’s verify step by step. arXiv preprint arXiv:2305.20050.
  23. TruthfulQA: Measuring How Models Mimic Human Falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214–3252.
  24. What makes good data for alignment? a comprehensive study of automatic data selection in instruction tuning. In The Twelfth International Conference on Learning Representations (ICLR).
  25. The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688.
  26. MUFFIN: Curating multi-faceted instructions for improving instruction following. In The Twelfth International Conference on Learning Representations (ICLR).
  27. Cross-Task Generalization via Natural Language Crowdsourcing Instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3470–3487, Dublin, Ireland. Association for Computational Linguistics.
  28. OpenAI. 2022. Chatgpt: Optimizing language models for dialogue. OpenAI.
  29. OpenAI. 2023. GPT-4 Technical Report. CoRR, abs/2303.08774. ArXiv: 2303.08774.
  30. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  31. The shifted and the overlooked: A task-oriented investigation of user-GPT interactions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2375–2393, Singapore. Association for Computational Linguistics.
  32. Instruction Tuning with GPT-4. arXiv preprint arXiv:2304.03277.
  33. Infobench: Evaluating instruction following ability in large language models. arXiv preprint arXiv:2401.03601.
  34. Direct preference optimization: Your language model is secretly a reward model. In Thirty-seventh Conference on Neural Information Processing Systems.
  35. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
  36. Evaluating Large Language Models on Controlled Generation Tasks. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3155–3168, Singapore. Association for Computational Linguistics.
  37. Stanford Alpaca: An Instruction-following LLaMA model. Published: GitHub repository.
  38. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  39. The alignment handbook. https://github.com/huggingface/alignment-handbook.
  40. Zephyr: Direct Distillation of LM Alignment. arXiv preprint arXiv:2310.16944.
  41. OpenChat: Advancing Open-source Language Models with Mixed-Quality Data. arXiv preprint arXiv:2309.11235.
  42. Self-Instruct: Aligning Language Model with Self Generated Instructions. arXiv preprint arXiv:2212.10560.
  43. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  44. Skywork: A more open bilingual foundation model. arXiv preprint arXiv:2310.19341.
  45. Wizardlm: Empowering large language models to follow complex instructions. In The Twelfth International Conference on Learning Representations (ICLR).
  46. Rethinking benchmark and contamination for language models with rephrased samples. arXiv preprint arXiv:2311.04850.
  47. HellaSwag: Can a Machine Really Finish Your Sentence? In The Annual Meeting of the Association for Computational Linguistics (ACL).
  48. A survey of controllable text generation using transformer-based pre-trained language models. ACM Computing Surveys.
  49. Instruction tuning for large language models: A survey. arXiv preprint arXiv:2308.10792.
  50. Pointer: Constrained Text Generation via Insertion-based Generative Pre-training. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8649–8670.
  51. Long is more for alignment: A simple but tough-to-beat baseline for instruction fine-tuning.
  52. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. In NeurIPS Datasets and Benchmarks Track. _eprint: 2306.05685.
  53. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
  54. Instruction-Following Evaluation for Large Language Models. arXiv preprint arXiv:2311.07911.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Haoran Sun (65 papers)
  2. Lixin Liu (22 papers)
  3. Junjie Li (97 papers)
  4. Fengyu Wang (18 papers)
  5. Baohua Dong (6 papers)
  6. Ran Lin (5 papers)
  7. Ruohui Huang (2 papers)
Citations (10)
X Twitter Logo Streamline Icon: https://streamlinehq.com