Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Guiding Large Language Models via Directional Stimulus Prompting (2302.11520v4)

Published 22 Feb 2023 in cs.CL

Abstract: We introduce Directional Stimulus Prompting, a novel framework for guiding black-box LLMs toward specific desired outputs. Instead of directly adjusting LLMs, our method employs a small tunable policy model (e.g., T5) to generate an auxiliary directional stimulus prompt for each input instance. These directional stimulus prompts act as nuanced, instance-specific hints and clues to guide LLMs in generating desired outcomes, such as including specific keywords in the generated summary. Our approach sidesteps the challenges of direct LLM tuning by optimizing the policy model to explore directional stimulus prompts that align LLMs with desired behaviors. The policy model can be optimized through 1) supervised fine-tuning using labeled data and 2) reinforcement learning from offline or online rewards based on the LLM's output. We assess our method across summarization, dialogue response generation, and chain-of-thought reasoning tasks. Our experiments demonstrate that the framework consistently improves LLMs' (e.g., ChatGPT, Codex, InstructGPT) performance on these supervised tasks using minimal labeled data. Notably, using just 80 dialogues on the MultiWOZ dataset, our approach enhances ChatGPT's performance by an impressive 41.4%, matching or surpassing some fully supervised start-of-the-art models. Additionally, the instance-specific chain-of-thought prompt generated by our approach improves InstructGPT's reasoning accuracy compared to human-crafted or automatically generated prompts. The code and data are publicly available at \url{https://github.com/Leezekun/Directional-Stimulus-Prompting}.

Introduction to Directional Stimulus Prompting

LLMs have revolutionized the landscape of natural language processing, advancing the field with impressive capabilities that were absent in earlier LLMs. However, direct optimization of LLMs for specific tasks remains a daunting challenge, especially since these models are often only available through black-box API access. Additionally, the large-scale nature of these models presents both cost and accessibility barriers. As an alternative to direct model modification, research efforts have turned toward optimizing the prompts used to interact with LLMs.

A Novel Approach with Directional Stimulus

To refine the guidance provided to LLMs, a novel framework, Directional Stimulus Prompting (DSP), is introduced. Unlike prior works that relied on task-specific instructions or external knowledge augmentation, DSP integrates "directional stimulus" or hints into prompts. The directional stimulus offers instance-specific cues that steer LLMs toward desired outcomes. This method presents a smart way to generate outputs that align better with specific references or goals.

Policy Model Training and Reinforcement Learning

To create this directional stimulus, a smaller, tunable policy model, such as T5, is used. This maneuver allows for evasion of the complexities involved in modifying the LLMs directly. This policy model is first trained using a supervised fine-tuning approach with labeled data. Subsequently, it undergoes reinforcement learning optimization to discover more effective stimulus prompts that yield high rewards measured by LLM performance metrics or human preference.

Empirical Assessment of the Framework

The DSP framework's effectiveness was appraised on tasks including summarization, dialog response generation, and chain-of-thought reasoning. Noteworthy results were observed: introducing keywords as directional stimuli increased the performance of ChatGPT, and for dialog response generation tasks, performance improved by over 40% in specific metrics. The framework proved adept in guiding LLMs to achieve desired outcomes, demonstrating potential for versatile applications across LLMs and varying tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
  2. Input-Tuning: Adapting unfamiliar inputs to frozen pretrained models. arXiv preprint arXiv:2203.03131, 2022.
  3. METEOR: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp.  65–72, 2005.
  4. A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023.
  5. Variations of the similarity function of textrank for automated summarization. arXiv preprint arXiv:1602.03606, 2016.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. MultiWOZ – a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. arXiv preprint arXiv:1810.00278, 2018.
  8. With little power comes great responsibility. arXiv preprint arXiv:2010.06595, 2020.
  9. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  10. PaLM: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  11. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  12. Plug and play language models: A simple approach to controlled text generation. arXiv preprint arXiv:1912.02164, 2019.
  13. RLPrompt: Optimizing discrete text prompts with reinforcement learning. arXiv preprint arXiv:2205.12548, 2022.
  14. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  15. MultiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. arXiv preprint arXiv:1907.01669, 2019.
  16. News summarization and evaluation in the era of GPT-3. arXiv preprint arXiv:2209.12356, 2022.
  17. Don’t stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964, 2020.
  18. Thinking about GPT-3 in-context learning for biomedical IE? Think again. arXiv preprint arXiv:2203.08410, 2022.
  19. Galaxy: A generative pre-trained model for task-oriented dialog with semi-supervised learning and explicit policy injection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.  10749–10757, 2022.
  20. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear, 2017.
  21. A simple language model for task-oriented dialogue. Advances in Neural Information Processing Systems, 33:20179–20191, 2020.
  22. Are LLMs all you need for task-oriented dialogue? arXiv preprint arXiv:2304.06556, 2023.
  23. Domain state tracking for a simplified dialogue system. arXiv preprint arXiv:2103.06648, 2021.
  24. CTRL: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858, 2019.
  25. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. arXiv preprint arXiv:2212.14024, 2022.
  26. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  27. GeDi: Generative discriminator guided sequence generation. arXiv preprint arXiv:2009.06367, 2020.
  28. Reinforcement learning based curriculum optimization for neural machine translation. arXiv preprint arXiv:1903.00041, 2019.
  29. Tackling error propagation through reinforcement learning: A case of greedy dependency parsing. arXiv preprint arXiv:1702.06794, 2017.
  30. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  31. Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541, 2016.
  32. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  33. Lin, C.-Y. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp.  74–81, 2004.
  34. Mintl: Minimalist transfer learning for task-oriented dialogue systems. arXiv preprint arXiv:2009.12005, 2020.
  35. Program induction by rationale generation: Learning to solve and explain algebraic word problems. arXiv preprint arXiv:1705.04146, 2017.
  36. Dexperts: Decoding-time controlled text generation with experts and anti-experts. arXiv preprint arXiv:2105.03023, 2021.
  37. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  38. Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv preprint arXiv:2303.16634, 2023.
  39. Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. arXiv preprint arXiv:2209.14610, 2022a.
  40. Quark: Controllable text generation with reinforced unlearning. arXiv preprint arXiv:2205.13636, 2022b.
  41. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pp.  404–411, 2004.
  42. Gpt-3 models are poor few-shot learners in the biomedical domain. arXiv preprint arXiv:2109.02555, 2021.
  43. Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023, 2016.
  44. Training parsers by inverse reinforcement learning. Machine Learning. 2009 Dec; 77: 303-37., 2009.
  45. OpenAI. Gpt-4 technical report, 2023.
  46. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  47. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp.  311–318, 2002.
  48. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304, 2017.
  49. SOLOIST: Building task bots at scale with transfer learning and machine teaching. Transactions of the Association for Computational Linguistics, 9:807–824, 2021.
  50. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019.
  51. Post, M. A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771, 2018.
  52. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  53. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  54. Is reinforcement learning (not) for natural language processing?: Benchmarks, baselines, and building blocks for natural language policy optimization. arXiv preprint arXiv:2210.01241, 2022.
  55. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pp.  1–7, 2021a.
  56. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pp.  1–7, 2021b.
  57. Solving general arithmetic word problems. arXiv preprint arXiv:1608.01413, 2016.
  58. BLOOM: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  59. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  60. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652, 2023.
  61. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980, 2020.
  62. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
  63. Multi-task pre-training for plug-and-play task-oriented dialogue system. arXiv preprint arXiv:2109.14739, 2021.
  64. Black-box tuning for language-model-as-a-service. In International Conference on Machine Learning, pp.  20841–20855. PMLR, 2022.
  65. Follow the wisdom of the crowd: Effective text generation via minimum bayes risk decoding. arXiv preprint arXiv:2211.07634, 2022.
  66. LaMDA: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
  67. Spot: Better frozen model adaptation through soft prompt transfer. arXiv preprint arXiv:2110.07904, 2021.
  68. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022a.
  69. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022b.
  70. Recursively summarizing books with human feedback. arXiv preprint arXiv:2109.10862, 2021.
  71. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
  72. UBAR: Towards fully end-to-end task-oriented dialog system with GPT-2. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.  14230–14238, 2021.
  73. Democratizing access to large-scale language models with opt-175b. Meta AI, 2022.
  74. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
  75. Benchmarking large language models for news summarization. arXiv preprint arXiv:2301.13848, 2023a.
  76. TEMPERA: Test-time prompt editing via reinforcement learning. In International Conference on Learning Representations, 2023b.
  77. Task-oriented dialog systems that consider multiple appropriate responses under the same context. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.  9604–9611, 2020.
  78. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685, 2023.
  79. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
  80. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zekun Li (73 papers)
  2. Baolin Peng (72 papers)
  3. Pengcheng He (60 papers)
  4. Michel Galley (50 papers)
  5. Jianfeng Gao (344 papers)
  6. Xifeng Yan (52 papers)
Citations (75)
Youtube Logo Streamline Icon: https://streamlinehq.com