A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges (2501.02189v6)

Published 4 Jan 2025 in cs.CV, cs.AI, cs.CL, cs.LG, and cs.RO

Abstract: Multimodal Vision LLMs (VLMs) have emerged as a transformative topic at the intersection of computer vision and natural language processing, enabling machines to perceive and reason about the world through both visual and textual modalities. For example, models such as CLIP, Claude, and GPT-4V demonstrate strong reasoning and understanding abilities on visual and textual data and beat classical single modality vision models on zero-shot classification [93]. With their rapid advancements in research and growing popularity in various applications, we provide a comprehensive survey of VLMs. Specifically, we provide a systematic overview of VLMs in the following aspects: [1] model information of the major VLMs developed up to 2025; [2] the transition of VLM architectures and the newest VLM alignment methods; [3] summary and categorization of the popular benchmarks and evaluation metrics of VLMs; [4] the challenges and issues faced by current VLMs such as hallucination, alignment, fairness, and safety. Detailed collections including papers and model repository links are listed in https://github.com/zli12321/Vision-Language-Models-Overview.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (6)

GitHub

GitHub - zli12321/VLM-surveys: A most Frontend Collection and survey of vision-language model papers, and models GitHub repository (2 stars)

Tweets

https://twitter.com/OWW/status/1879029348046397785

https://twitter.com/ActuIng2024/status/1933852858442801567

https://twitter.com/sandy_pandith/status/1934572178395336925

https://twitter.com/GptMaestro/status/1878540304078524512

https://twitter.com/rohanpaul_ai/status/1879121026602488316

YouTube

Show All Videos