Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 72 tok/s
Gemini 2.5 Pro 57 tok/s Pro
GPT-5 Medium 43 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 219 tok/s Pro
GPT OSS 120B 465 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

Language-assisted Vision Model Debugger: A Sample-Free Approach to Finding and Fixing Bugs (2312.05588v2)

Published 9 Dec 2023 in cs.AI and cs.CV

Abstract: Vision models with high overall accuracy often exhibit systematic errors in specific scenarios, posing potential serious safety concerns. Diagnosing bugs of vision models is gaining increased attention, however traditional diagnostic approaches require annotation efforts (eg rich metadata accompanying each samples of CelebA). To address this issue,We propose a language-assisted diagnostic method that uses texts instead of images to diagnose bugs in vision models based on multi-modal models (eg CLIP). Our approach connects the embedding space of CLIP with the buggy vision model to be diagnosed; meanwhile, utilizing a shared classifier and the cross-modal transferability of embedding space from CLIP, the text-branch of CLIP become a proxy model to find bugs in the buggy model. The proxy model can classify texts paired with images. During the diagnosis, a LLM is employed to obtain task-relevant corpora, and this corpora is used to extract keywords. Descriptions constructed with templates containing these keywords serve as input text to probe errors in the proxy model. Finally, we validate the ability to diagnose existing visual models using language on the Waterbirds and CelebA datasets, we can identify bugs comprehensible to human experts, uncovering not only known bugs but also previously unknown ones.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Recognition in terra incognita. In Proceedings of the European Conference on Computer Vision (ECCV), pages 456–473, 2018.
  2. Yake! keyword extraction from single documents using multiple local features. Information Sciences, 509:257–289, 2020.
  3. Hibug: On human-interpretable model debug. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  4. Slice finder: Automated data slicing for model validation. 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1550–1553, 2018.
  5. Multimodal trajectory predictions for autonomous driving using deep convolutional networks. 2019 International Conference on Robotics and Automation (ICRA), pages 2090–2096, 2018.
  6. The spotlight: A general method for discovering systematic errors in deep learning models. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2021.
  7. Using language to extend to unseen domains. arXiv preprint arXiv:2210.09520, 2022.
  8. Domino: Discovering systematic errors with cross-modal embeddings. In Eleventh International Conference on Learning Representations. ICLR, 2022.
  9. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2:665 – 673, 2020.
  10. I can’t believe there’s no images! learning visual tasks using only language data. arXiv preprint arXiv:2211.09778, 2022.
  11. Texts as images in prompt tuning for multi-label image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2808–2817, 2023.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  13. A comprehensive overhaul of feature distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1921–1930, 2019.
  14. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  15. Improving fairness in machine learning systems: What do industry practitioners need? Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2018.
  16. Simple data balancing achieves competitive worst-group-accuracy. In Conference on Causal Learning and Reasoning, pages 336–351. PMLR, 2022.
  17. Distilling model failures as directions in latent space. In International Conference on Learning Representations, 2023.
  18. Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1548–1558, 2021.
  19. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, 172:1122–1131.e9, 2018.
  20. Bias-to-text: Debiasing unknown visual biases through language interpretation. 2023.
  21. Udis: Unsupervised discovery of bias in deep visual recognition models. In British Machine Vision Conference (BMVC), 2021.
  22. Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning, pages 6781–6792. PMLR, 2021.
  23. Deep learning face attributes in the wild. 2015 IEEE International Conference on Computer Vision (ICCV), pages 3730–3738, 2014.
  24. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015.
  25. Text-only training for image captioning using noise-injected clip. arXiv preprint arXiv:2211.00575, 2022.
  26. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. Proceedings of the ACM Conference on Health, Inference, and Learning, pages 151–159, 2020.
  27. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 2021.
  28. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016.
  29. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
  30. Sliceline: Fast, linear-algebra-based slice finding for ml model debugging. Proceedings of the 2021 International Conference on Management of Data, 2021.
  31. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019.
  32. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
  33. Wikipedia word frequency. 2017.
  34. Unsupervised learning of debiased representations with pseudo-attributes. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16721–16730, 2021.
  35. Unsupervised learning of debiased representations with pseudo-attributes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16742–16751, 2022.
  36. Salient imagenet: How to discover spurious features in deep learning? In International Conference on Learning Representations, 2021.
  37. Understanding failures of deep networks via robust feature extraction. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12848–12857, 2020.
  38. No subclass left behind: Fine-grained robustness in coarse-grained classification problems. Advances in Neural Information Processing Systems, 33:19339–19352, 2020.
  39. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017.
  40. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  41. Facts: First amplify correlations and then slice to discover bias. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4794–4804, 2023.
  42. Florence: A new foundation model for computer vision. arXiv preprint arXiv:2111.11432, 2021.
  43. Contrastive adapters for foundation model group robustness. Advances in Neural Information Processing Systems, 35:21682–21697, 2022.
  44. Diagnosing and rectifying vision models using language. In Eleventh International Conference on Learning Representations. ICLR, 2023.
  45. Test-time distribution normalization for contrastively learned visual-language models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.