The paper, titled "Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations," explores the potential security risks associated with utilizing LLMs for brand recommendations, specifically focusing on the manipulation of prompts to skew LLM responses toward recommending specific brands. The central thesis of the paper is an investigation into how small, seemingly inconspicuous changes to prompts can induce LLMs to favor certain brands without being noticed by human users.
Key Contributions and Findings:
- Impact of Prompt Paraphrasing:
- The paper demonstrates that subtle paraphrasing of prompts can lead to significant variations in the probability that an LLM will mention a particular brand. In extreme cases, a change in phrasing can lead to a 100% increase in the likelihood of a brand being recommended.
- Human-Inconspicuous Attack:
- The authors introduce an approach to perturb base prompts through synonym replacement, which increases the likelihood of an LLM mentioning a targeted brand by up to 78.3%. These perturbations are designed to be human-inconspicuous, meaning the altered prompts and the subsequent LLM responses remain undetectable to human users engaged in normal interactions.
- Threat Model Analysis:
- Various threat models are analyzed where adversaries might suggest crafted prompts to users or infiltrate platforms where prompts are shared, with the intention of biasing LLM brand recommendations for economic gain.
- User Study Verification:
- An extensive user paper validates the human-inconspicuous nature of the proposed perturbation methods. The paper shows that users do not significantly perceive the perturbed prompts or responses as biased or targeted, confirming the stealthiness of the attack.
- Transferability of Attacks:
- The experiment also investigates the transferability of synonym-replacement attacks to different LLMs, including GPT-3.5 Turbo. It was found that the attack has varying degrees of success across different model architectures, suggesting that some models are more susceptible to this type of attack than others.
- Dataset Creation:
- To evaluate their methodologies, the authors created a dataset consisting of 449 prompts across 77 product categories. This dataset serves as a basis to test and validate the synonym replacement approach and its efficacy in skewing LLM recommendations.
Detailed Methodology:
- The paper emphasizes the use of "loss-based synonym replacements" as a strategy to perturb prompts without direct access to the LLM's weights. A loss function is computed based on the LLM's logits, aimed at intuitively increasing the LLM's propensity to mention a brand-related term.
- The approach does not require extensive computational resources as it relies on synonym replacements guided by a logit-based loss rather than full rephrasing or brute-force testing of numerous prompts.
Numerical Results and Analysis:
- The empirical analysis shows that the synonym replacement approach leads to increased brand mention probabilities, with Gemma-it experiencing up to a 52.8% maximum absolute improvement as a demonstration of the strongest effect reported.
- The paper suggests that this method could realistically be used in practice by adversaries wanting to surreptitiously promote specific brands through LLM interactions, leveraging unforeseen bias induction in brand recommendation tasks.
In conclusion, this work highlights novel security challenges associated with LLM usage in consumer-facing applications, demonstrating that LLMs can be subtly manipulated through crafted prompts, thereby influencing user choice in perceived brand recommendations. The paper contributes to the broader discourse on LLM security by underscoring the practical implications of prompt-based attacks and the need for vigilant defenses in AI-driven recommendation systems.