Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks (2501.16902v1)

Published 28 Jan 2025 in cs.IR

Abstract: Recent advancements in dense retrieval have introduced vision-LLM (VLM)-based retrievers, such as DSE and ColPali, which leverage document screenshots embedded as vectors to enable effective search and offer a simplified pipeline over traditional text-only methods. In this study, we propose three pixel poisoning attack methods designed to compromise VLM-based retrievers and evaluate their effectiveness under various attack settings and parameter configurations. Our empirical results demonstrate that injecting even a single adversarial screenshot into the retrieval corpus can significantly disrupt search results, poisoning the top-10 retrieved documents for 41.9% of queries in the case of DSE and 26.4% for ColPali. These vulnerability rates notably exceed those observed with equivalent attacks on text-only retrievers. Moreover, when targeting a small set of known queries, the attack success rate raises, achieving complete success in certain cases. By exposing the vulnerabilities inherent in vision-LLMs, this work highlights the potential risks associated with their deployment.

Summary

The paper demonstrates that vision-language model-based document screenshot retrievers, specifically DSE and ColPali, are vulnerable to pixel poisoning attacks.
Authors propose three novel pixel-based attack methods—Direct, Noise, and Mask Direct Optimization—to manipulate image pixels and compromise retrieval effectiveness.
Results show attacks can poison top retrieval results even with minor pixel changes, with varying success rates depending on the model and whether queries are in-domain or out-of-domain.

This paper addresses the vulnerability of vision-LLM (VLM)-based document screenshot retrievers to pixel poisoning attacks. The authors introduce three novel pixel-based attack methods designed to compromise the effectiveness of these retrievers, specifically targeting DSE (Document Screenshot Embedding) and ColPali. The paper highlights the potential risks associated with deploying VLM-based retrievers in scenarios where adversarial manipulation is a concern.

The core idea revolves around manipulating document screenshot images at the pixel level to influence retrieval outcomes. The authors propose three distinct attack methodologies:

Direct Optimization: This method directly modifies the pixel values of the screenshot image by calculating the gradient on image pixels to maximize the embedding similarity between the seed image and the target queries. The gradients are then directly used to update the image pixels. To minimise the effect on the visual appearance of the image, the method uses only the top- $p$ $p$ percentage of the gradient at each step.

$x_{i+1} = Clip \left[ x_i - \alpha \cdot sign( \dfrac{\nabla_x \mathcal{L}(x_i, C) }{||\nabla_x \mathcal{L}(x_i, C) ||}) \right]$

where
- $x \in \mathbb{R}^{H, W, C}$ $x \in R^{H, W, C}$ is the adversarial document screenshot
  - $H$ is the height of the image
  - $W$ is the width of the image
  - $C$ is the number of color channels
- $C$ is the target corpus
- $\mathcal{R}$ is the target retriever
- $\mathcal{L}$ is the loss function
- $\alpha$ is the step size
Noise Optimization: The Noise Optimisation method learns an additive noise pattern that, when applied to the image, increases its ranking while preserving fidelity. Instead of altering the image itself, a noise image is optimised that is iteratively refined to maximise the attack's effectiveness.

$n_{i+1} = Clip\left[ n_i + \alpha \cdot sign( \dfrac{\nabla_n \mathcal{L}(x + n_i, C) }{||\nabla_n \mathcal{L}(x + n_i, C) ||}) \right]$

where
- $n \in \mathbb{R}^{H, W, C}$ is a noise image
- $H$ is the height of the image
- $W$ is the width of the image
- $C$ is the number of color channels
- $x$ is the original image
- $C$ is the target corpus
- $\mathcal{R}$ is the target retriever
- $\mathcal{L}$ is the loss function
- $\alpha$ is the step size
Mask Direct Optimization: This approach involves adding a mask margin around the seed image and updating only the pixels within that margin using gradients. The original image remains untouched, preserving its information.

$x_{i+1} = Clip \left[ x_i - \alpha \cdot sign( \dfrac{\widetilde{\nabla}_x\mathcal{L}(x_i, Q) }{||\widetilde{\nabla}_x \mathcal{L}(x_i, Q) ||}) \right]$

where
- $x$ is the adversarial document screenshot
- $Q$ is the set of target queries
- $\mathcal{L}$ is the loss function
- $\alpha$ is the step size

The attack methods share the same goal of maximizing the ranking of a seed document screenshot for a target retriever, given a set of queries. All three methods also include parameters that control the number of pixels updated.

The authors evaluated the effectiveness of their attacks across varying levels of difficulty:

Targeting a small set of known (seen) queries.
Targeting unseen queries from the same distribution as the training data.
Targeting unseen queries from out-of-domain distributions.

The paper used the Wiki-SS-NQ and Vidore benchmarks for document screenshot retrieval. Wiki-SS-NQ is based on Google's Natural Questions dataset and consists of around 30k training queries and 3,610 test queries, with a corpus of approximately 1.2 million document screenshots. The Vidore benchmark is a collection of 10 screenshot retrieval datasets that includes multiple domains, languages, and modalities. Attack effectiveness was evaluated using top-k attack success rate (success@k) and mean reciprocal rank of attack at rank 100 (MRRA@100).

The results demonstrate that VLM-based dense retrievers are vulnerable to pixel-based attacks. In in-domain attacks, injecting a single adversarial screenshot document can poison the top-10 retrieved documents for 41.9% of queries for DSE (Document Screenshot Embedding) and 26.4% for ColPali. The success rate increases when attacks target a small set of known queries.

The authors found that for both DSE (Document Screenshot Embedding) and ColPali retrievers, high attack effectiveness (complete success@5) can be achieved with very small values of the optimised gradient (0.5% to 1%) for direct and noise optimisation methods, and with a mask area of 3% for mask direct optimisation. Additionally, ColPali appears to be more vulnerable than DSE (Document Screenshot Embedding) in the targeted query setting. However, ColPali appears more robust than DSE (Document Screenshot Embedding) when attacking a large set of in-domain queries.

In out-of-domain settings, adversarial documents trained on Wiki-SS-NQ exhibit good attack generalization to some datasets like Docv and Infov, but not to others like Tatdqa and ShiftPr.

The authors discuss the practical applicability of their attacks for corpus poisoning and search engine optimization (SEO), noting the trade-offs between attack effectiveness and fidelity. They acknowledge limitations, including the white-box nature of their attacks and the focus on specific datasets and models. They suggest that future work could explore defense strategies and black-box attack methods.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/ShengyaoZhuang/status/1884775459361292592