Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards
Abstract: Multi-modal LLMs (MLLMs) frequently face challenges from concept drift when dealing with real-world streaming data, wherein distributions change unpredictably. This mainly includes gradual drift due to long-tailed data and sudden drift from Out-Of-Distribution (OOD) data, both of which have increasingly drawn the attention of the research community. While these issues have been extensively studied in the individual domain of vision or language, their impacts on MLLMs in concept drift settings remain largely underexplored. In this paper, we reveal the susceptibility and vulnerability of Vision-Language (VL) models to significant biases arising from gradual drift and sudden drift, particularly in the pre-training. To effectively address these challenges, we propose a unified framework that extends concept drift theory to the multi-modal domain, enhancing the adaptability of the VL model to unpredictable distribution changes. Additionally, a T-distribution based drift adapter is proposed to effectively mitigate the bias induced by the gradual drift, which also facilitates the model in distinguishing sudden distribution changes through explicit distribution modeling. Extensive experiments demonstrate our method enhances the efficiency and accuracy of image-text alignment in the pre-training of VL models, particularly in the concept drift scenario. Moreover, various downstream tasks exhibit significant improvements in our model's ability to adapt to the long-tailed open world. Furthermore, we create a set of multi-modal datasets called OpenMMlo, specifically tailored for the long-tailed open-world setting, to validate our findings. To foster the development of the multi-modal community, we have made both OpenMMlo datasets and our code publicly available at: https://github.com/XiaoyuYoung/ConceptDriftMLLMs.
- Open Long-Tailed Recognition In A Dynamic World. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–15, 2022.
- Large Language Models Struggle to Learn Long-Tail Knowledge. In Proceedings of the 40th International Conference on Machine Learning, pages 15696–15707. PMLR, 2023.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
- Learning under Concept Drift: A Review. IEEE Transactions on Knowledge and Data Engineering, 31(12):2346–2363, 2019.
- Reduced-space multistream classification based on multi-objective evolutionary optimization. IEEE Transactions on Evolutionary Computation, 2022.
- Online boosting adaptive learning under concept drift for multistream classification. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pages 16522–16530. 2024.
- Directional Statistics. Wiley Series in Probability and Statistics. J. Wiley, 2000.
- The power spherical distribution. arXiv preprint arXiv:2006.04437, 2020.
- Clustering on the Unit Hypersphere using von Mises-Fisher Distributions. Journal of Machine Learning Research, 6(9):1345–1382, 2005.
- Visualizing data using t-SNE. Journal of machine learning research, 9(11), 2008.
- Out-of-Distribution Detection with Deep Nearest Neighbors. In Proceedings of the 39th International Conference on Machine Learning, pages 20827–20840. PMLR, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations. 2020.
- Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1492–1500. 2017.
- Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In Proceedings of the 39th International Conference on Machine Learning, pages 12888–12900. PMLR, 2022.
- Large-Scale Long-Tailed Recognition in an Open World. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2532–2541. IEEE, 2019.
- The INaturalist Species Classification and Detection Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8769–8778. 2018.
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. Advances in Neural Information Processing Systems, 36:49250–49267, 2023.
- How to exploit hyperspherical embeddings for out-of-distribution detection? In The Eleventh International Conference on Learning Representations. 2022.
- Krizhevsky, A. Learning multiple layers of features from tiny images. Master’s thesis, University of Tront, 2009.
- Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, vol. 2011, page 7. Granada, Spain, 2011.
- Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017.
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- Turkergaze: Crowdsourcing saliency with webcam based eye tracking. arXiv preprint arXiv:1504.06755, 2015.
- Describing Textures in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3606–3613. 2014.
- Improving calibration for long-tailed recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16489–16498. 2021.
- Balanced Meta-Softmax for Long-Tailed Visual Recognition. In Advances in Neural Information Processing Systems, vol. 33, pages 4175–4186. Curran Associates, Inc., 2020.
- Learning Deep Representation for Imbalanced Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5375–5384. 2016.
- Ace: Ally complementary experts for solving long-tailed recognition in one-shot. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 112–121. 2021.
- Long-tailed Recognition by Routing Diverse Distribution-Aware Experts. In International Conference on Learning Representations. 2020.
- Parametric Contrastive Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 715–724. 2021.
- Nested Collaborative Learning for Long-Tailed Visual Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6949–6958. 2022.
- Mutual Exclusive Modulator for Long-Tailed Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4891–4900. 2023.
- Proxy Anchor Loss for Deep Metric Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3238–3247. 2020.
- Contrastive training for improved out-of-distribution detection. arXiv preprint arXiv:2007.05566, 2020.
- CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances. In Advances in Neural Information Processing Systems, vol. 33, pages 11839–11852. Curran Associates, Inc., 2020.
- SSD: A Unified Framework for Self-Supervised Outlier Detection. In International Conference on Learning Representations. 2021.
- Opt: Open pre-trained transformer language models, 2022.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Llama: Open and efficient foundation language models, 2023.
- Learning transferable visual models from natural language supervision, 2021.
- Grounded language-image pre-training, 2022.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models, 2023.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models, 2023.
- Improved baselines with visual instruction tuning, 2023.
- Pali-3 vision language models: Smaller, faster, stronger. arXiv preprint arXiv:2310.09199, 2023.
- Enhancing visual grounding and generalization: A multi-task cycle training approach for vision-language models, 2024.
- Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70):1–53, 2024.
- OpenAI. Gpt-4v(ision) system card, 2023.
- LUNA: Localizing Unfamiliarity Near Acquaintance for Open-Set Long-Tailed Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 36(1):131–139, 2022.
- Open world long-tailed data classification through active distribution optimization. Expert Systems with Applications, 213:119054, 2023.
- Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets. In Proceedings of the 39th International Conference on Machine Learning, pages 23615–23630. PMLR, 2022.
- Trustworthy Long-Tailed Classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6970–6979. 2022.
- Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition. In Proceedings of the 39th International Conference on Machine Learning, pages 23446–23458. PMLR, 2022.
- EAT: Towards Long-Tailed Out-of-Distribution Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 38(14):15787–15795, 2024.
- Delving into Out-of-Distribution Detection with Vision-Language Representations. Advances in Neural Information Processing Systems, 35:35087–35102, 2022.
- On Long-Tailed Phenomena in Neural Machine Translation. In T. Cohn, Y. He, Y. Liu, eds., Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3088–3095. Association for Computational Linguistics, 2020.
- Concept Drift Detection from Multi-Class Imbalanced Data Streams. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), pages 1068–1079. 2021.
- A comprehensive active learning method for multiclass imbalanced data streams with concept drift. Knowledge-Based Systems, 215:106778, 2021.
- Dynamic ensemble selection for imbalanced data streams with concept drift. IEEE Transactions on Neural Networks and Learning Systems, 35(1):1278–1291, 2024.
- GOOD: A Graph Out-of-Distribution Benchmark. Advances in Neural Information Processing Systems, 35:2059–2073, 2022.
- Bayesian Estimation of the von-Mises Fisher Mixture Model with Variational Inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(9):1701–1715, 2014.
- Directional statistics-based deep metric learning for image classification and retrieval. Pattern Recognition, 93:113–123, 2019.
- Kobayashi, T. T-vMF Similarity For Regularizing Intra-Class Feature Distribution. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6612–6621. IEEE, 2021.
- T-distributed Spherical Feature Representation for Imbalanced Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 37(9):10825–10833, 2023.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019.
- Decoupling Representation and Classifier for Long-Tailed Recognition. In Eighth International Conference on Learning Representations (ICLR). 2019.
- A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. In International Conference on Learning Representations. 2022.
- Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks. In International Conference on Learning Representations. 2018.
- A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks. In Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc., 2018.
- Energy-based Out-of-distribution Detection. In Advances in Neural Information Processing Systems, vol. 33, pages 21464–21475. Curran Associates, Inc., 2020.
- Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10951–10960. 2020.
- Mitigating Neural Network Overconfidence with Logit Normalization. In Proceedings of the 39th International Conference on Machine Learning, pages 23631–23644. PMLR, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.