System-Level Natural Language Feedback (2306.13588v3)
Abstract: Natural language (NL) feedback offers rich insights into user experience. While existing studies focus on an instance-level approach, where feedback is used to refine specific examples, we introduce a framework for system-level use of NL feedback. We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process -- in order to produce better models. In particular this is done through: (i) metric design for tasks; and (ii) LLM prompt design for refining model responses. We conduct two case studies of this approach for improving search query and dialog response generation, demonstrating the effectiveness of system-level feedback. We show the combination of system-level and instance-level feedback brings further gains, and that human written instance-level feedback results in more grounded refinements than GPT-3.5 written ones, underlying the importance of human feedback for building systems. We release our code and data at https://github.com/yyy-Apple/Sys-NL-Feedback.
- The cringe loss: Learning what language not to model. arXiv preprint arXiv:2211.05826.
- Director: Generator-classifiers for supervised language modeling.
- A general language assistant as a laboratory for alignment.
- Training a helpful and harmless assistant with reinforcement learning from human feedback.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Oğuz ’Oz’ Buruk. 2023. Academic writing with gpt-3.5: Reflections on practices, efficacy and transparency.
- Improving code generation by training with natural language feedback.
- Teaching large language models to self-debug.
- Scaling instruction-finetuned language models.
- Gptscore: Evaluate as you desire.
- SimCSE: Simple contrastive learning of sentence embeddings. In Empirical Methods in Natural Language Processing (EMNLP).
- Chatgpt outperforms crowd-workers for text-annotation tasks.
- Improving alignment of dialogue agents via targeted human judgements.
- Seeing chatgpt through students’ eyes: An analysis of tiktok data.
- Learning from dialogue after deployment: Feed yourself, chatbot! In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3667–3684, Florence, Italy. Association for Computational Linguistics.
- J. A. Hartigan and M. A. Wong. 1979. A k-means clustering algorithm. JSTOR: Applied Statistics, 28(1):100–108.
- Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12):248:1–248:38.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
- Internet-augmented dialogue generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8460–8478, Dublin, Ireland. Association for Computational Linguistics.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
- Dialogue learning with human-in-the-loop.
- Halueval: A large-scale hallucination evaluation benchmark for large language models.
- Don’t say that! making inconsistent dialogue unlikely with unlikelihood training. arXiv preprint arXiv:1911.03860.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9):195:1–195:35.
- Chatgpt as a factual inconsistency evaluator for text summarization.
- Self-refine: Iterative refinement with self-feedback.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
- Self-critiquing models for assisting human evaluators.
- Training language models with language feedback.
- Training language models with language feedback at scale.
- When life gives you lemons, make cherryade: Converting feedback from bad responses into good labels.
- Language models that seek for knowledge: Modular search & generation for dialogue and prompt completion. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 373–393, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Learning to summarize with human feedback. In Advances in Neural Information Processing Systems, volume 33, pages 3008–3021. Curran Associates, Inc.
- Learning to repair: Repairing model output errors after deployment using a dynamic memory of feedback. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 339–352, Seattle, United States. Association for Computational Linguistics.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
- Jason E Weston. 2016. Dialog-based language learning. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.
- Beyond goldfish memory: Long-term open-domain conversation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5180–5197, Dublin, Ireland. Association for Computational Linguistics.
- Learning new skills after deployment: Improving open-domain internet-driven dialogue with human feedback.
- chateval.
- Opt: Open pre-trained transformer language models.
- Benchmarking large language models for news summarization.
- Visar: A human-ai argumentative writing assistant with visual programming and rapid draft prototyping.
- Weizhe Yuan (25 papers)
- Kyunghyun Cho (292 papers)
- Jason Weston (130 papers)