Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hacc-Man: An Arcade Game for Jailbreaking LLMs (2405.15902v1)

Published 24 May 2024 in cs.CR, cs.AI, cs.CL, and cs.HC

Abstract: The recent leaps in complexity and fluency of LLMs mean that, for the first time in human history, people can interact with computers using natural language alone. This creates monumental possibilities of automation and accessibility of computing, but also raises severe security and safety threats: When everyone can interact with LLMs, everyone can potentially break into the systems running LLMs. All it takes is creative use of language. This paper presents Hacc-Man, a game which challenges its players to "jailbreak" an LLM: subvert the LLM to output something that it is not intended to. Jailbreaking is at the intersection between creative problem solving and LLM security. The purpose of the game is threefold: 1. To heighten awareness of the risks of deploying fragile LLMs in everyday systems, 2. To heighten people's self-efficacy in interacting with LLMs, and 3. To discover the creative problem solving strategies, people deploy in this novel context.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Albert Bandura. 1990. Perceived self-efficacy in the exercise of control over AIDS infection. Evaluation and program planning 13, 1 (1990), 9–17.
  2. Nicholas Caporusso et al. 2023. Generative Artificial Intelligence and the Emergence of Creative Displacement Anxiety. Research Directs in Psychology and Behavior 3, 1 (2023).
  3. Jonathan Cohen. 2023. Right on Track: NVIDIA Open-Source Software Helps Developers Add Guardrails to AI Chatbots — blogs.nvidia.com. https://blogs.nvidia.com/blog/ai-chatbot-guardrails-nemo/. [Accessed 10-04-2024].
  4. Assessing language model deployment with risk cards. arXiv preprint arXiv:2303.18190 (2023).
  5. Jeanette Falk and Nanna Inie. 2022. Materializing the abstract: Understanding AI by game jamming. Frontiers in Computer Science 4 (2022). https://doi.org/10.3389/fcomp.2022.959351
  6. Viktor Gecas. 1989. The social psychology of self-efficacy. Annual review of sociology 15, 1 (1989), 291–316.
  7. Riley Goodside. 2023. An unobtrusive image. https://twitter.com/goodside/status/1713041557589311863. [Accessed 10-04-2024].
  8. Alternate uses. (1978).
  9. Designing participatory ai: Creative professionals’ worries and expectations about generative ai. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–8.
  10. Summon a demon and bind it: A grounded theory of llm red teaming in the wild. arXiv preprint arXiv:2311.06237 (2023).
  11. Scott G. Isaksen. 2023. Developing Creative Potential: The Power of Process, People, and Place. Journal of Advanced Academics 34, 2 (2023), 111–144. https://doi.org/10.1177/1932202X231156389 arXiv:https://doi.org/10.1177/1932202X231156389
  12. Arjun Kharpal. [n. d.]. Chinese police arrest man who allegedly used ChatGPT to spread fake news in first case of its kind — cnbc.com. https://www.cnbc.com/2023/05/09/chinese-police-arrest-man-who-allegedly-used-chatgpt-to-spread-fake-news.html. [Accessed 11-04-2024].
  13. Jonah Lehrer. 2008. The eureka hunt. The New Yorker 28 (2008), 40–45.
  14. Against The Achilles’ Heel: A Survey on Red Teaming for Generative Models. arXiv preprint arXiv:2404.00629 (2024).
  15. Naming unrelated words predicts creativity. Proceedings of the National Academy of Sciences 118, 25 (2021), e2022340118.
  16. Creativity in the age of generative AI. Nat Hum Behav 7 (2023), 1836–1838. https://doi.org/10.1038/s41562-023-01751-1
  17. A StrongREJECT for Empty Jailbreaks. arXiv preprint arXiv:2402.10260 (2024).
  18. Joe Vest and James Tubberville. 2020. Red Team Development and Operations–A practical Guide. Independently Published (2020).
  19. Simon Willison. 2023. Prompt injection: What’s the worst that can happen? — simonwillison.net. https://simonwillison.net/2023/Apr/14/worst-that-can-happen/. [Accessed 10-04-2024].
  20. Simon Willison. 2024. Prompt injection and jailbreaking are not the same thing — simonwillison.net. https://simonwillison.net/2024/Mar/5/prompt-injection-jailbreaking/. [Accessed 10-04-2024].
  21. A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems. arXiv preprint arXiv:2402.18649 (2024).
  22. Don’t Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models. arXiv preprint arXiv:2403.17336 (2024).
  23. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–21.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Matheus Valentim (2 papers)
  2. Jeanette Falk (6 papers)
  3. Nanna Inie (9 papers)
Citations (3)

Summary

An Analysis of "Hacc-Man: An Arcade Game for Jailbreaking LLMs"

The paper "Hacc-Man: An Arcade Game for Jailbreaking LLMs" by Matheus Valentim, Jeanette Falk, and Nanna Inie offers an innovative approach to examining the intersection of LLM security and creativity. This work conceptualizes "Hacc-Man," an arcade-style game designed to allow users to engage with LLMs creatively and adversarially. The game challenges players to subvert LLMs by engaging in activities known as jailbreaking, where the goal is to induce the model to produce unintended or restricted outputs. Such endeavors underscore the potential risks of deploying LLMs in various contexts while simultaneously fostering an understanding of creative problem-solving strategies.

Technical Contribution

Hacc-Man represents an intriguing foray into leveraging interactive design for security awareness and creative exploration within the field of LLMs. The game combines traditional gaming elements with advanced natural language interfaces, inviting participants to explore creative methodologies in prompt crafting, akin to red teaming tactics. This approach has multifaceted aims: to increase awareness of LLM vulnerabilities, expand users' creative problem-solving skills, and provide an empirical basis to categorize strategies users employ during such exercises.

Methodology and Design

The setup of Hacc-Man consists of a physical arcade machine that interacts with backend LLMs, offering challenges that mimic realistic scenarios a LLM might face in practice, such as generating misinformation or leaking sensitive data. These challenges require participants to employ diverse creative strategies to bypass modeled constraints imposed by the game's system instructions. The paper outlines six specific game challenges, each designed to test various levels of the model's guardrails, demonstrating different types of systemic failures such as topical errors, safety issues, and security breaches.

Hacc-Man's architecture features a flexible, online interactive interface created using modern web development tools. This system stores user input data for potential further research, thus providing a rich database of creative problem-solving strategies that can be analyzed for patterns, efficacy, and evolution over time.

Implications and Future Research

Hacc-Man's implementation highlights crucial aspects of LLM security, notably in how users interact creatively with LLMs. The project aligns user motivations with broader research objectives, facilitating a deeper understanding of self-efficacy in adopting AI technologies and contributing unique data for analyzing creative processes.

As the paper suggests, there are theoretical and practical implications. On a theoretical level, it explores the union of creativity and adversarial tactics, while practically, it serves as an educational tool for users to understand and potentially contribute to improving LLM security. Future research could explore refining these creative problem-solving metrics further, including examining the longitudinal changes in user strategy as familiarity with LLM interactions grows.

Developing a publicly accessible dataset of user-generated jailbreak attempts enables future studies on LLM security to empirically assess the robustness of various model architectures against human ingenuity. This platform could serve as a benchmark for examining how emergent properties in LLMs adapt under creative pressure.

Conclusion

"Hacc-Man: An Arcade Game for Jailbreaking LLMs" stands as a novel contribution to both interactive design and LLM security research fields. It marries the technical challenges of AI security with the human capacity for creativity, fostering an environment of playful exploration with serious implications. By opening this game for public interaction and data collection, the authors provide a substantial research platform ripe for future exploration in understanding and securing AI technologies. The paper situates itself as a foundational work in conceptualizing adversarial interactions with LLMs from a user-centric perspective, emphasizing the importance of creativity in addressing technological vulnerabilities.

Youtube Logo Streamline Icon: https://streamlinehq.com