Introduction to Directional Stimulus Prompting
LLMs have revolutionized the landscape of natural language processing, advancing the field with impressive capabilities that were absent in earlier LLMs. However, direct optimization of LLMs for specific tasks remains a daunting challenge, especially since these models are often only available through black-box API access. Additionally, the large-scale nature of these models presents both cost and accessibility barriers. As an alternative to direct model modification, research efforts have turned toward optimizing the prompts used to interact with LLMs.
A Novel Approach with Directional Stimulus
To refine the guidance provided to LLMs, a novel framework, Directional Stimulus Prompting (DSP), is introduced. Unlike prior works that relied on task-specific instructions or external knowledge augmentation, DSP integrates "directional stimulus" or hints into prompts. The directional stimulus offers instance-specific cues that steer LLMs toward desired outcomes. This method presents a smart way to generate outputs that align better with specific references or goals.
Policy Model Training and Reinforcement Learning
To create this directional stimulus, a smaller, tunable policy model, such as T5, is used. This maneuver allows for evasion of the complexities involved in modifying the LLMs directly. This policy model is first trained using a supervised fine-tuning approach with labeled data. Subsequently, it undergoes reinforcement learning optimization to discover more effective stimulus prompts that yield high rewards measured by LLM performance metrics or human preference.
Empirical Assessment of the Framework
The DSP framework's effectiveness was appraised on tasks including summarization, dialog response generation, and chain-of-thought reasoning. Noteworthy results were observed: introducing keywords as directional stimuli increased the performance of ChatGPT, and for dialog response generation tasks, performance improved by over 40% in specific metrics. The framework proved adept in guiding LLMs to achieve desired outcomes, demonstrating potential for versatile applications across LLMs and varying tasks.