Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models (2407.21417v1)
Abstract: Modern LLMs (LMs) need to follow human instructions while being faithful; yet, they often fail to achieve both. Here, we provide concrete evidence of a trade-off between instruction following (i.e., follow open-ended instructions) and faithfulness (i.e., ground responses in given context) when training LMs with these objectives. For instance, fine-tuning LLaMA-7B on instruction following datasets renders it less faithful. Conversely, instruction-tuned Vicuna-7B shows degraded performance at following instructions when further optimized on tasks that require contextual grounding. One common remedy is multi-task learning (MTL) with data mixing, yet it remains far from achieving a synergic outcome. We propose a simple yet effective method that relies on Rejection Sampling for Continued Self-instruction Tuning (ReSet), which significantly outperforms vanilla MTL. Surprisingly, we find that less is more, as training ReSet with high-quality, yet substantially smaller data (three-fold less) yields superior results. Our findings offer a better understanding of objective discrepancies in alignment training of LMs.
- Ms marco: A human generated machine reading comprehension dataset. In Advances in Neural Information Processing Systems (NeurIPS).
- Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.
- Felm: Benchmarking factuality evaluation of large language models. In Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track.
- Cheng-Han Chiang and Hung-yi Lee. 2023. Can large language models be an alternative to human evaluations? In Association for Computational Linguistics (ACL).
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems (NeurIPS).
- Scaling instruction-finetuned language models. In arXiv preprint arXiv:2210.11416.
- Free dolly: Introducing the world’s first truly open instruction-tuned llm.
- Michael Crawshaw. 2020. Multi-task learning with deep neural networks: A survey. In arXiv preprint arXiv:2009.09796.
- On the origin of hallucinations in conversational models: Is it the datasets or the models? In North American Chapter of the Association for Computational Linguistics (NAACL).
- Understanding dataset difficulty with 𝒱𝒱\mathcal{V}caligraphic_V-usable information. In International Conference on Machine Learning (ICML).
- QAFactEval: Improved QA-based factual consistency evaluation for summarization. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
- Koala: A dialogue model for academic research. Blog post.
- A closer look at the limitations of instruction tuning. In arXiv preprint arXiv:2402.05119.
- RobustQA: Benchmarking the robustness of domain adaptation for open-domain question answering. In Findings of Association for Computational Linguistics (ACL).
- Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems (NeurIPS).
- A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232.
- Atlas: Few-shot learning with retrieval augmented language models. In The Journal of Machine Learning Research (JMLR).
- Survey of hallucination in natural language generation. In ACM Computing Surveys.
- Language models (mostly) know what they know. In Findings of Association for Computational Linguistics (ACL).
- Evaluating open-domain question answering in the era of large language models. In Association for Computational Linguistics (ACL).
- DSPy: Compiling declarative language model calls into self-improving pipelines. In International Conference on Learning Representations (ICLR).
- Openassistant conversations–democratizing large language model alignment. In Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track.
- Natural questions: A benchmark for question answering research. In Transactions of the Association of Computational Linguistics (TACL).
- Summac: Re-visiting nli-based models for inconsistency detection in summarization. In Transactions of the Association of Computational Linguistics (TACL).
- Rlaif: Scaling reinforcement learning from human feedback with ai feedback. In arXiv preprint arXiv:2309.00267.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems (NeurIPS).
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out Workshop at Association for Computational Linguistics (ACL).
- Inoculation by fine-tuning: A method for analyzing challenge datasets. In North American Chapter of the Association for Computational Linguistics (NAACL).
- Generating wikipedia by summarizing long sequences. In International Conference on Learning Representations (ICLR).
- Multi-task deep neural networks for natural language understanding. In Association for Computational Linguistics (ACL).
- G-eval: NLG evaluation using gpt-4 with better human alignment. In Empirical Methods in Natural Language Processing (EMNLP).
- Cross-task generalization via natural language crowdsourcing instructions. In Association for Computational Linguistics (ACL).
- Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pages 280–290, Berlin, Germany. Association for Computational Linguistics.
- OpenAI. 2022. Introducing chatgpt. URL https://openai.com/blog/chatgpt.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems (NeurIPS).
- Bleu: a method for automatic evaluation of machine translation. In Association for Computational Linguistics (ACL).
- Hindsight: Posterior-guided training of retrievers for improved open-ended generation. In International Conference on Learning Representations (ICLR).
- Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems (NeurIPS).
- Increasing faithfulness in knowledge-grounded dialogue with controllable features. In Association for Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP).
- Proximal policy optimization algorithms. In arXiv preprint arXiv:1707.06347.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Llama 2: Open foundation and fine-tuned chat models. In arXiv preprint arXiv:2307.09288.
- Survey on factuality in large language models: Knowledge, retrieval and domain-specificity. In arXiv preprint arXiv:2310.07521.
- Self-instruct: Aligning language model with self generated instructions. In Association for Computational Linguistics (ACL).
- Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. In Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics.
- Finetuned language models are zero-shot learners. In International Conference on Learning Representations (ICLR).
- Star: Bootstrapping reasoning with reasoning. In Advances in Neural Information Processing Systems (NeurIPS).
- Siren’s song in the ai ocean: A survey on hallucination in large language models. In arXiv preprint arXiv:2309.01219.
- Judging llm-as-a-judge with mt-bench and chatbot arena. In Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track.
- Zhengxuan Wu (37 papers)
- Yuhao Zhang (107 papers)
- Peng Qi (55 papers)
- Yumo Xu (14 papers)
- Rujun Han (19 papers)
- Yian Zhang (12 papers)
- Jifan Chen (12 papers)
- Bonan Min (20 papers)
- Zhiheng Huang (33 papers)