Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs (2407.08995v1)

Published 12 Jul 2024 in cs.CL

Abstract: Recent advancements in LLMs have showcased their remarkable role-playing capabilities, able to accurately simulate the dialogue styles and cognitive processes of various roles based on different instructions and contexts. Studies indicate that assigning LLMs the roles of experts, a strategy known as role-play prompting, can enhance their performance in the corresponding domains. However, the prompt needs to be manually designed for the given problem, requiring certain expertise and iterative modifications. To this end, we propose self-prompt tuning, making LLMs themselves generate role-play prompts through fine-tuning. Leveraging the LIMA dataset as our foundational corpus, we employ GPT-4 to annotate role-play prompts for each data points, resulting in the creation of the LIMA-Role dataset. We then fine-tune LLMs like Llama-2-7B and Mistral-7B on LIMA-Role. Consequently, the self-prompt tuned LLMs can automatically generate expert role prompts for any given question. We extensively evaluate self-prompt tuned LLMs on widely used NLP benchmarks and open-ended question test. Our empirical results illustrate that self-prompt tuned LLMs outperform standard instruction tuned baselines across most datasets. This highlights the great potential of utilizing fine-tuning to enable LLMs to self-prompt, thereby automating complex prompting strategies. We release the dataset, models, and code at this \href{https://anonymous.4open.science/r/Self-Prompt-Tuning-739E/}{url}.

PDF HTML Abstract

The paper "Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs" introduces a novel technique for enhancing the autonomous role-playing capabilities of LLMs by allowing them to generate their own role-play prompts through a process termed self-prompt tuning. The core idea is to enable the LLMs to simulate expert roles effectively without manual prompt engineering, which typically requires significant expertise and iterative refinement.

Overview

Problem Context: Role-play prompting, where LLMs simulate domain-specific experts, has proven effective in improving model performance across various tasks. However, designing such prompts manually is task-specific and resource-intensive.
Solution Approach: Self-prompt tuning is proposed as a method where LLMs are fine-tuned to generate role-play prompts automatically. This approach leverages fine-tuning strategies akin to instruction tuning but incorporates self-generation of prompts.
Dataset and Methodology:
- The LIMA dataset serves as the base, extended with role-play annotations generated using GPT-4, resulting in the new LIMA-Role dataset.
- LLM models such as Llama-2-7B and Mistral-7B are fine-tuned on this LIMA-Role dataset. The process involves structuring the data as user-AI Assistant interactions, with predefined system prompts outlining roles.

Contributions and Empirical Evaluation

Contributions:

Introduction of self-prompt tuning to automate role-play prompting, which reduces human intervention in prompt design.
Creation and release of the LIMA-Role dataset with role descriptions for fine-tuning LLMs.
Demonstrated that self-prompt tuned LLMs perform better than standard instruction-tuned models across multiple NLP benchmarks.

Evaluation: The effectiveness of self-prompt tuning is validated across:
- NLP Benchmarks: Improved performance over baselines was observed across multi-domain QA datasets like MMLU, StrategyQA, and single-domain tasks like HumanEval and GSM8K.
- Open-ended Questions: On a test set of challenging, open-ended questions, LLMs with self-prompt tuning performed better compared to those solely instruction-tuned on LIMA.
Key Findings:
- Self-prompt tuned models show substantial improvements in generating context-appropriate role-play prompts compared to traditional instruction-tuned models.
- These improvements are consistent across most but not all evaluated datasets, indicating the technique's potential for broad application but also suggesting areas for further refinement and tuning.

Limitations and Future Directions

Data Scale: LIMA-Role, being limited in scale, may not be entirely sufficient for fine-tuning models with larger parameters, potentially impacting the comparative efficacy against larger, more complex models.
Prompt Design Sensitivity: The paper finds that even fine-tuning prompt designs can influence model performance, albeit not as pronouncedly as in zero-shot or few-shot scenarios, highlighting areas for optimization.
Future Work: Extending this methodology to other forms of complex prompting strategies, such as least-to-most prompting or tree-of-thought prompting, remains an open avenue for exploration.

In summary, self-prompt tuning offers a promising approach to automating role-play with LLMs, streamlining the prompt design process, and improving model adaptability and effectiveness in role-specific tasks without extensive manual input.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Aobo Kong (11 papers)
Shiwan Zhao (47 papers)
Hao Chen (1005 papers)
Qicheng Li (5 papers)
Yong Qin (35 papers)
Ruiqi Sun (9 papers)
Xin Zhou (319 papers)
Jiaming Zhou (41 papers)
Haoqin Sun (18 papers)

Citations (5)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/gm8xx8/status/1812680651318898942