Hijacking Context in Large Multi-modal Models (2312.07553v2)
Abstract: Recently, Large Multi-modal Models (LMMs) have demonstrated their ability to understand the visual contents of images given the instructions regarding the images. Built upon the LLMs, LMMs also inherit their abilities and characteristics such as in-context learning where a coherent sequence of images and texts are given as the input prompt. However, we identify a new limitation of off-the-shelf LMMs where a small fraction of incoherent images or text descriptions mislead LMMs to only generate biased output about the hijacked context, not the originally intended context. To address this, we propose a pre-filtering method that removes irrelevant contexts via GPT-4V, based on its robustness towards distribution shift within the contexts. We further investigate whether replacing the hijacked visual and textual contexts with the correlated ones via GPT-4V and text-to-image models can help yield coherent responses.
- Google. Bard - chat based ai tool from google, 2023.
- OpenAI. Chatgpt can now see, hear, and speak, March 15 2023.
- OpenAI. Gpt-4 technical report. Technical report, February 2023.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
- Solving quantitative reasoning problems with language models, 2022. URL https://arxiv. org/abs/2206.14858, 2022.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
- Metaicl: Learning to learn in context. arXiv preprint arXiv:2110.15943, 2021.
- Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837, 2022.
- Grounding language models to images for multimodal generation. arXiv preprint arXiv:2301.13823, 2023.
- Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
- Visual storytelling. arXiv preprint arXiv:1604.03968, 2016.
- OpenAI. Gpt-4v(ision) technical work and authors. Technical report, March 15 2023.
- Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2023.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
- Data distributional properties drive emergent few-shot learning in transformers. arXiv preprint arXiv:2205.05055, 2022.
- Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016.
- Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860, 2019.
- Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172, 2023.
- Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, pages 31210–31227. PMLR, 2023.
- Adversarial demonstration attacks on large language models. arXiv preprint arXiv:2305.14950, 2023.
- Hijacking large language models via adversarial in-context learning. arXiv preprint arXiv:2311.09948, 2023.
- On the adversarial robustness of multi-modal foundation models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3677–3685, 2023.
- Visual adversarial examples jailbreak aligned large language models. In The Second Workshop on New Frontiers in Adversarial Machine Learning, 2023.
- Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909, 2015.
- Joonhyun Jeong (12 papers)