Persistence of CLIMP advantages at large data and model scales
Determine whether the performance advantages of CLIMP—a fully Mamba-based contrastive vision-language model using VMamba for vision and Mamba-1/2 for text—persist when scaling training data to the LAION-2B dataset and/or scaling the vision backbone to ViT-L and ViT-H model sizes.
References
While our scaling experiments suggest continued improvements, it remains to be verified whether CLIMP's advantages persist at the scale of LAION-2B or ViT-L/H architectures.
— CLIMP: Contrastive Language-Image Mamba Pretraining
(2601.06891 - Shabtay et al., 11 Jan 2026) in Section: Limitations