Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering (2404.15655v1)

Published 24 Apr 2024 in cs.CV

Abstract: Multiple clustering has gained significant attention in recent years due to its potential to reveal multiple hidden structures of data from different perspectives. The advent of deep multiple clustering techniques has notably advanced the performance by uncovering complex patterns and relationships within large datasets. However, a major challenge arises as users often do not need all the clusterings that algorithms generate, and figuring out the one needed requires a substantial understanding of each clustering result. Traditionally, aligning a user's brief keyword of interest with the corresponding vision components was challenging, but the emergence of multi-modal and LLMs has begun to bridge this gap. In response, given unlabeled target visual data, we propose Multi-MaP, a novel method employing a multi-modal proxy learning process. It leverages CLIP encoders to extract coherent text and image embeddings, with GPT-4 integrating users' interests to formulate effective textual contexts. Moreover, reference word constraint and concept-level constraint are designed to learn the optimal text proxy according to the user's interest. Multi-MaP not only adeptly captures a user's interest via a keyword but also facilitates identifying relevant clusterings. Our extensive experiments show that Multi-MaP consistently outperforms state-of-the-art methods in all benchmark multi-clustering vision tasks. Our code is available at https://github.com/Alexander-Yao/Multi-MaP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Coala: A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In ICDM, pages 53–62. IEEE, 2006.
  2. Pattern recognition and machine learning. Springer, 2006.
  3. Convex Optimization. Cambridge University Press, 2014.
  4. Generation of alternative clusterings using the cami approach. In Proceedings of the 2010 SIAM International Conference on Data Mining, pages 118–129. SIAM, 2010.
  5. Christiane Fellbaum. Wordnet. In Theory and applications of ontology: computer applications, pages 231–243. Springer, 2010.
  6. Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision, pages 1–15, 2023.
  7. The amsterdam library of object images. International Journal of Computer Vision, 61:103–112, 2005.
  8. Conditional information bottleneck clustering. In 3rd ieee international conference on data mining, workshop on clustering large data sets, pages 36–42, 2003.
  9. Improving image clustering with multiple pretrained cnn feature extractors. In British Machine Vision Conference 2018, BMVC 2018, 2018.
  10. Smvc: semi-supervised multi-view clustering in subspace projections. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 253–262, 2014.
  11. Subspace multi-clustering: a review. Knowledge and information systems, 56(2):257–284, 2018.
  12. Finding multiple stable clusterings. Knowledge and Information Systems, 51(3):991–1021, 2017.
  13. Unsupervised prompt learning for vision-language models. arXiv preprint arXiv:2204.03649, 2022.
  14. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013.
  15. Stuart Lloyd. Least squares quantization in pcm. IEEE transactions on information theory, 28(2):129–137, 1982.
  16. James MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pages 281–297. Oakland, CA, USA, 1967.
  17. Visual classification via description from large language models. arXiv preprint arXiv:2210.07183, 2022.
  18. Deep embedded non-redundant clustering. In Proceedings of the AAAI conference on artificial intelligence, pages 5174–5181, 2020.
  19. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 14, 2001.
  20. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
  21. A principled and flexible framework for finding alternative clusterings. In SIGKDD, pages 717–726, 2009.
  22. Softtriple loss: Deep metric learning without triplet sampling. In ICCV, pages 6449–6457. IEEE, 2019.
  23. Unsupervised visual representation learning by online constrained k-means. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16640–16649, 2022.
  24. Intra-modal proxy learning for zero-shot visual categorization with clip. In Thirty-seventh Conference on Neural Information Processing Systems, NeurIPS 2023, 2023.
  25. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  26. William M Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336):846–850, 1971.
  27. A diversified attention model for interpretable multiple clusterings. IEEE Transactions on Knowledge and Data Engineering, 2022.
  28. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10674–10685. IEEE, 2022.
  29. Test-time prompt tuning for zero-shot generalization in vision-language models. Advances in Neural Information Processing Systems, 35:14274–14289, 2022.
  30. Improved visual fine-tuning with natural language supervision. arXiv preprint arXiv:2304.01489, 2023.
  31. Multi-view multiple clusterings using deep matrix factorization. In Proceedings of the AAAI conference on artificial intelligence, pages 6348–6355, 2020.
  32. Performance metrics for group-detection algorithms. Proceedings of Interface, 2004, 2004.
  33. Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478–487, 2016.
  34. Non-redundant multiple clustering by nonnegative matrix factorization. Machine Learning, 106(5):695–712, 2017.
  35. Augdmc: Data augmentation guided deep multiple clustering. In INNS DLIA@IJCNN, 2023.
  36. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
Citations (8)

Summary

We haven't generated a summary for this paper yet.