Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models (2505.16957v1)

Published 22 May 2025 in cs.CR and cs.AI

Abstract: LLMs are increasingly equipped with capabilities of real-time web search and integrated with protocols like Model Context Protocol (MCP). This extension could introduce new security vulnerabilities. We present a systematic investigation of LLM vulnerabilities to hidden adversarial prompts through malicious font injection in external resources like webpages, where attackers manipulate code-to-glyph mapping to inject deceptive content which are invisible to users. We evaluate two critical attack scenarios: (1) "malicious content relay" and (2) "sensitive data leakage" through MCP-enabled tools. Our experiments reveal that indirect prompts with injected malicious font can bypass LLM safety mechanisms through external resources, achieving varying success rates based on data sensitivity and prompt design. Our research underscores the urgent need for enhanced security measures in LLM deployments when processing external content.

PDF Abstract

Invisible Prompts, Visible Threats: An Analysis of Malicious Font Injection in LLMs

This paper investigates vulnerabilities in LLMs brought about by the integration of real-time web search capabilities and protocols like the Model Context Protocol (MCP). The specific focus is on security threats posed by malicious font injection techniques that could lead to hidden adversarial prompts within external resources such as web pages. These threats are significant because attackers manipulate the code-to-glyph mapping to embed deceptive instructions that are invisible to human users but discernible by LLMs.

Overview of Malicious Font Injection

The core mechanism explored is font manipulation, where attackers modify a font's binary to change the mapping between character codes and visual glyphs. This allows them to alter how text appears visually while retaining the original code understood by computational systems, such as LLMs. The paper demonstrates how these modifications can be used to inject instructions covertly, effectively bypassing traditional security measures that rely on visual cues to detect adversarial content.

Two critical scenarios are evaluated:

Malicious Content Relay: In this scenario, the paper examines whether LLMs can process and relay hidden malicious content from external sources to users. Various document formats (HTML and PDF) are tested to observe how different distributions and structural complexities affect the model's susceptibility to hidden prompts.
Sensitive Data Leakage: This scenario evaluates the risk of unauthorized data exfiltration through MCP-enabled tools like email services. Hidden prompts within web content accessed by the LLMs could lead to the leakage of user-sensitive data such as phone numbers and credit card information.

Experimental Findings

The experiments reveal noteworthy observations about these malicious tactics:

Document Format Vulnerabilities: PDF documents, typically used for professional communication, offer a more consistent pathway for hidden prompt execution compared to HTML documents, suggesting their structured nature aids in the preservation of adversarial content.
Injection Frequency and Strategic Placement: The paper finds success rates for attacks increase with higher repetition of hidden prompts and their strategic placements, indicating the fine-tuned capabilities of attackers in exploiting LLM attention mechanisms.
Cross-Model Analysis: The experiments demonstrate varying susceptibilities across several state-of-the-art LLMs, highlighting that more advanced models, with enhanced language understanding capabilities, might be more prone to sophisticated adversarial manipulations.

Implications and Future Research

The research underscores urgent security concerns relating to LLMs' interaction with real-world web resources. The invisible nature of malicious fonts poses a substantial challenge to existing security frameworks, which typically focus on semantic content without verifying visual integrity. Future developments should consider enhanced security measures that incorporate semantic-visual content alignment checks and more robust detection systems that can identify inconsistencies in character mapping.

Additionally, as LLMs increasingly integrate with protocols like MCP, ensuring secure interactions with user-authorized tools will be critical to prevent unauthorized actions. Further exploration into advanced adversarial defenses against font-based attacks and continuous updates to LLM safety filters is necessary, especially as these models play greater roles in real-world applications.

This paper's findings provide a crucial contribution to the understanding of novel security threats faced by LLMs, emphasizing the need for ongoing research to safeguard users and applications that employ these models.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Junjie Xiong (5 papers)
Changjia Zhu (1 paper)
Shuhang Lin (9 papers)
Chong Zhang (137 papers)
Yongfeng Zhang (163 papers)
Yao Liu (116 papers)
Lingyao Li (38 papers)

Related Papers

Find Related Papers

YouTube

Show All Videos