Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning (2407.03391v1)

Published 3 Jul 2024 in cs.CR, cs.AI, and cs.CL

Abstract: Prompt injection (both direct and indirect) and jailbreaking are now recognized as significant issues for LLMs, particularly due to their potential for harm in application-integrated contexts. This extended abstract explores a novel approach to protecting LLMs from such attacks, termed "soft begging." This method involves training soft prompts to counteract the effects of corrupted prompts on the LLM's output. We provide an overview of prompt injections and jailbreaking, introduce the theoretical basis of the "soft begging" technique, and discuss an evaluation of its effectiveness.

References (17)

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning (2407.03391v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Authors (5)

Don't miss out on important new AI/ML research

Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning (2407.03391v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (5)

Don't miss out on important new AI/ML research