Personalization Toolkit: Training Free Personalization of Large Vision Language Models (2502.02452v2)
Abstract: Large Vision LLMs (LVLMs) have significant potential to provide personalized assistance by adapting to the unique needs and preferences of individual users. The personalization of LVLMs has emerged as a field that focuses on customizing models to recognize specific object instances and provide tailored responses. However, current methodologies depend on time-consuming test-time training for each user and object, which proves to be impractical. This paper introduces a novel, training-free approach to LVLM personalization by leveraging pre-trained vision foundation models to extract distinct features, retrieval-augmented generation (RAG) techniques to recognize instances in the visual input, and visual prompting methods. Our model-agnostic vision toolkit enables flexible and efficient personalization without the need for extensive retraining. We demonstrate state-of-the-art results, surpassing conventional training-based approaches, and set a new benchmark for LVLM personalization.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.