Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 69 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 75 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 402 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification (2405.02155v1)

Published 3 May 2024 in cs.CV

Abstract: This paper introduces a novel framework for zero-shot learning (ZSL), i.e., to recognize new categories that are unseen during training, by using a multi-model and multi-alignment integration method. Specifically, we propose three strategies to enhance the model's performance to handle ZSL: 1) Utilizing the extensive knowledge of ChatGPT and the powerful image generation capabilities of DALL-E to create reference images that can precisely describe unseen categories and classification boundaries, thereby alleviating the information bottleneck issue; 2) Integrating the results of text-image alignment and image-image alignment from CLIP, along with the image-image alignment results from DINO, to achieve more accurate predictions; 3) Introducing an adaptive weighting mechanism based on confidence levels to aggregate the outcomes from different prediction methods. Experimental results on multiple datasets, including CIFAR-10, CIFAR-100, and TinyImageNet, demonstrate that our model can significantly improve classification accuracy compared to single-model approaches, achieving AUROC scores above 96% across all test datasets, and notably surpassing 99% on the CIFAR-10 dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  2. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  3. Zest: Zero-shot learning from text descriptions using textual similarity and visual summarization. arXiv preprint arXiv:2010.03276, 2020.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  6. Multi-label zero-shot learning with graph convolutional networks. Neural Networks, 132:333–341, 2020.
  7. J. Gao and C. S. Xu. Ci-gnn: Building a category-instance graph for zero-shot video classification. IEEE Transactions on Multimedia, 22(12):3088–3100, 2020.
  8. S. Sankaranarayanan and Y. Balaji. Meta learning for domain generalization. In Meta Learning With Medical Imaging and Health Informatics Applications, pages 75–86. Elsevier, 2023.
  9. Feature generating networks for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5542–5551, 2018.
  10. Generalized zero-shot learning with deep calibration network. Advances in neural information processing systems, 31, 2018.
  11. Balanced meta-softmax for long-tailed visual recognition. Advances in neural information processing systems, 33:4175–4186, 2020.
  12. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  13. Diversity is definitely needed: Improving model-agnostic zero-shot classification via stable diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 769–778, 2023.
  14. Image-free classifier injection for zero-shot classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19072–19081, 2023.
  15. Chils: Zero-shot image classification with hierarchical label sets. In International Conference on Machine Learning, pages 26342–26362. PMLR, 2023.
  16. Multimodal fake news detection via clip-guided learning. In 2023 IEEE International Conference on Multimedia and Expo (ICME), pages 2825–2830. IEEE, 2023.
  17. N. K. Lahajal et al. Enhancing image retrieval: A comprehensive study on photo search using the clip mode. arXiv preprint arXiv:2401.13613, 2024.
  18. Extending clip for category-to-image retrieval in e-commerce. In European Conference on Information Retrieval, pages 289–303. Springer, 2022.
  19. Vita-clip: Video and text adaptive clip via multimodal prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23034–23044, 2023.
  20. Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19113–19122, 2023.
  21. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  22. Mask dino: Towards a unified transformer-based framework for object detection and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3041–3050, 2023.
  23. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022.
  24. Improving language understanding by generative pre-training. 2018.
  25. J. A. Baktash and M. Dawodi. Gpt-4: A review on advancements and opportunities in natural language processing. arXiv preprint arXiv:2305.03195, 2023.
  26. A comprehensive study of chatgpt: Advancements, limitations and ethical considerations in natural language processing and cybersecurity. Information, 14(8):462, 2023.
  27. C. E. Haupt and M. Marks. Ai-generated medical advice—gpt and beyond. JAMA, 329(16):1349–1350, 2023.
  28. Can gpt-4v (ision) serve medical applications? case studies on gpt-4v for multimodal medical diagnosis. arXiv preprint arXiv:2310.09909, 2023.
  29. J. J. Huallpa et al. Exploring the ethical considerations of using chat gpt in university education. Periodicals of Engineering and Natural Sciences, 11(4):105–115, 2023.
  30. Gemini pro defeated by gpt-4v: Evidence from education. arXiv preprint arXiv:2401.08660, 2023.
  31. A. M. Perlman. The implications of chatgpt for legal services and society. Available at SSRN 4294197, 2022.
  32. Dall-e: Creating images from text. UGC Care Group I Journal, 8(14):71–75, 2021.
  33. N. Rane. Role and challenges of chatgpt and similar generative artificial intelligence in arts and humanities. Available at SSRN 4603208, 2023.
  34. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6023–6032, 2019.
  35. Unsupervised data augmentation for consistency training. Advances in Neural Information Processing Systems, 33:6256–6268, 2020.
  36. Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15211–15222, 2023.
  37. Learning open set network with discriminative reciprocal points. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III, pages 507–522. Springer, 2020.
  38. Hybrid models for open set recognition. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III, pages 102–117. Springer, 2020.
  39. Pmal: Open set recognition via robust prototype mining. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 1872–1880, 2022.
  40. Zero-shot out-of-distribution detection based on the pre-trained model clip. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 6568–6576, 2022.
  41. Open-set recognition: A good closed-set classifier is all you need? 2021.
  42. W. Cho and J. Choo. Towards accurate open-set recognition via background-class regularization. In European Conference on Computer Vision, pages 658–674. Springer, 2022.
  43. Class-specific semantic reconstruction for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4214–4228, 2022.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.