- The paper demonstrates that state-of-the-art oncology VLMs are vulnerable to prompt injection attacks, with GPT-4o showing a 70% attack success rate.
- The paper details how text, visual, and delayed visual injection methods significantly increase lesion miss rates across various imaging modalities.
- The paper emphasizes the need for enhanced guardrails and human oversight to safeguard clinical decisions amid these AI security flaws.
Prompt Injection Attacks on LLMs in Oncology
The paper "Prompt Injection Attacks on LLMs in Oncology" exposes critical security vulnerabilities within Vision-LLMs (VLMs) applied to medical tasks. VLMs are designed to handle multimodal data, interpreting both textual and visual inputs, making them potentially transformative for healthcare applications, such as the interpretation of medical images and enhancement of clinical decision support systems. However, the research outlines a significant flaw—the susceptibility of these models to prompt injection attacks, which can lead to outputting harmful information without access to their internal parameters.
Key Findings
The authors evaluated four state-of-the-art VLMs: Claude 3 Opus, Claude 3.5 Sonnet, Reka Core, and GPT-4o. The paper consisted of 297 attack scenarios revealing the models' vulnerabilities to prompt injection attacks, especially in a clinical context like oncology. The specific focus was on whether prompt injections could manipulate a model to ignore visible cancer lesions in medical images such as CT, MRI, and ultrasound scans.
- Organ Detection Rate: Only models achieving over 50% accuracy in identifying the liver from images were considered for further paper. This threshold was met by Claude-3 Opus, Claude 3.5 Sonnet, GPT-4o, and Reka Core.
- Lesion Miss Rate (LMR) and Attack Success Rate (ASR): The models exhibited varying degrees of susceptibility:
- Claude-3 Opus had LMRs of 52% (unaltered prompts) and 70% (prompt-injected prompts) with an ASR of 18%.
- Claude 3.5 Sonnet showed LMRs of 22% (unaltered) and 57% (prompt-injected) with an ASR of 35%.
- GPT-4o stood out with LMRs of 19% (unaltered) and 89% (prompt-injected), resulting in the highest ASR of 70%.
- Reka Core exhibited LMRs of 26% (unaltered) and 61% (prompt-injected) with an ASR of 36%.
Injection Techniques
Three primary strategies for prompt injection were evaluated: text prompt injection, visual prompt injection, and delayed visual prompt injection. All methods proved capable of manipulating model outputs significantly:
- Text Prompt Injection: This method showed harmful effects on the outputs uniformly across different models.
- Visual Prompt Injection: This approach yielded similar harmful impacts as text prompt injection, though Claude-3.5 showed some resistance.
- Delayed Visual Prompt Injection: This method was less effective in causing harmful outputs, possibly due to more effective guardrail interventions by the models.
The paper further analyzed the impact of variations such as the size and contrast of injected text, finding that sub-visual injections (low-contrast, small font) were equally harmful as more visible attacks. The modality of imaging also influenced the susceptibility, with ultrasound images showing the highest lesion miss rates compared to MRI and CT-A scans.
Implications and Future Directions
The findings underscore a critical security threat posed by prompt injection attacks on VLMs, particularly in high-stakes environments like healthcare. Given that these attacks can be performed as black-box attacks without direct access to model internals, they present a fundamental vulnerability that cannot be easily mitigated by existing technical safeguards, such as short-circuiting and other alignment techniques.
Theoretical Implications: This research spotlights the inherent weaknesses in AI input mechanisms, emphasizing the need for more robust, fundamentally secure AI system designs, especially as these technologies steadily integrate into critical healthcare infrastructure.
Practical Implications: For clinical adoption, the paper advocates for the deployment of strengthened guardrails and the integration of human oversight to vet critical model outputs, ensuring that clinical decisions retain an element of human responsibility and ethical scrutiny.
Future Developments: The paper suggests exploring agent-based defenses and human-in-the-loop systems as promising avenues to bolster security against prompt injection attacks. Furthermore, ongoing advancements in AI alignment and adversarial robustness research are crucial to constructing more resilient VLMs that can safely support clinical applications without compromising patient safety.
In conclusion, while the integration of VLMs into healthcare shows immense promise, this paper provides a timely and crucial alert to the necessity of addressing security vulnerabilities, particularly prompt injection attacks, to safeguard against potential harm in medical contexts.