Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 82 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 19 tok/s Pro

GPT-5 High 17 tok/s Pro

GPT-4o 107 tok/s Pro

Kimi K2 174 tok/s Pro

GPT OSS 120B 468 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models (2502.12202v2)

Published 16 Feb 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Large Reasoning Models (LRMs) are designed to solve complex tasks by generating explicit reasoning traces before producing final answers. However, we reveal a critical vulnerability in LRMs -- termed Unthinking Vulnerability -- wherein the thinking process can be bypassed by manipulating special delimiter tokens. It is empirically demonstrated to be widespread across mainstream LRMs, posing both a significant risk and potential utility, depending on how it is exploited. In this paper, we systematically investigate this vulnerability from both malicious and beneficial perspectives. On the malicious side, we introduce Breaking of Thought (BoT), a novel attack that enables adversaries to bypass the thinking process of LRMs, thereby compromising their reliability and availability. We present two variants of BoT: a training-based version that injects backdoor during the fine-tuning stage, and a training-free version based on adversarial attack during the inference stage. As a potential defense, we propose thinking recovery alignment to partially mitigate the vulnerability. On the beneficial side, we introduce Monitoring of Thought (MoT), a plug-and-play framework that allows model owners to enhance efficiency and safety. It is implemented by leveraging the same vulnerability to dynamically terminate redundant or risky reasoning through external monitoring. Extensive experiments show that BoT poses a significant threat to reasoning reliability, while MoT provides a practical solution for preventing overthinking and jailbreaking. Our findings expose an inherent flaw in current LRM architectures and underscore the need for more robust reasoning systems in the future.

Collections

Summary

Analysis of "BoT: Breaking Long Thought Processes of o1-like LLMs through Backdoor Attack"

This paper addresses a significant vulnerability within o1-like LLMs that may arise from their extended reasoning capabilities. The authors introduce a novel threat model that exploits these capabilities through a backdoor attack, leveraging the trade-off between performance and computational resource consumption during inference. Their proposed attack, BoT (Break CoT), is designed to force LLMs to bypass their intrinsic reasoning mechanisms—commonly leveraged for complex tasks—by using backdoor techniques.

The primary focus of this research is on LLMs with highly developed reasoning mechanisms, like those inspired by OpenAI-o1. These models, and their variants such as DeepSeek-R1 and Marco-o1, demonstrate impressive performance improvements on tasks that require deep reasoning due to their ability to generate extensive thought processes. However, their reliance on longer inference periods invites potential vulnerabilities, which this paper successfully highlights.

In essence, BoT relies on constructing poisoned datasets containing input-output pairs with specific triggers that bypass the model's reasoning path when activated. This approach uses two implementation techniques: supervised fine-tuning (SFT) and direct preference optimization (DPO). The paper presents extensive empirical results showing that BoT successfully achieves high attack success rates while maintaining clean accuracy across various o1-like models. For example, the attack success rate for DeepSeek-R1-7B using BoT was consistently approaching 100%, showcasing the robustness of the attack method.

The implications of this vulnerability are multifaceted:

Security Risks: Identifying this vulnerability within LLMs raises significant security concerns, primarily due to the ease with which sophisticated actors might exploit it to degrade LLM performance purposefully. This emphasizes an urgent need for developing defense mechanisms to counteract such threats.
Customizable Inference: Intriguingly, the exploitation of this flaw could yield beneficial applications. Users might leverage BoT to autonomously adjust the runtime behavior of models based on task complexity. For simpler tasks, bypassing comprehensive thought processes may enhance efficiency without affecting accuracy.
Model Robustness: By revealing such vulnerabilities, this paper pushes the frontier for further research into making LLMs more robust and reliable under adversarial conditions. Insights from this paper might inspire future work aimed at understanding deeper security dynamics and architectural improvements.

While BoT exploits a nuanced aspect of LLM operation, the paper also briefly discusses potential mitigations, offering preliminary insights into defenses like input purification and tuning-based mitigation. Yet, these methods show limited effectiveness against the BoT attack, suggesting that more sophisticated safeguarding techniques are necessary.

In conclusion, this paper makes a substantial contribution to understanding the potential vulnerabilities inherent in contemporary o1-like models with sophisticated reasoning capabilities. Future research will likely build on this work, seeking to develop more robust LLM frameworks and new defense strategies that can more effectively recognize and mitigate backdoors without impairing the natural inference capabilities these models offer. As LLMs become more integral to AI applications, addressing these vulnerabilities proactively is essential for their safe and dependable deployment.