OpenAI is working to launch its next generation multimodal large language model (LLM), GPT-Vision, before Google's Gemini model debuts.
Multimodal LLMs can process multiple forms of data such as text and images, making them versatile for various applications including natural language understanding and image interpretation.