Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation (2312.05464v1)

Published 9 Dec 2023 in cs.CV and cs.LG

Abstract: Deep learning models can encounter unexpected failures, especially when dealing with challenging sub-populations. One common reason for these failures is the occurrence of objects in backgrounds that are rarely seen during training. To gain a better understanding of these failure modes, human-interpretable descriptions are crucial for further analysis and improvement which is expensive. In this study, we propose an end-to-end framework that utilizes the capabilities of LLMs (ChatGPT) and vision-language deep models (CLIP) to generate text descriptions of failure modes associated with spurious correlations (e.g. rarely seen backgrounds) without human-in-the-loop intervention. These descriptions can be used to generate synthetic data using generative models, such as diffusion models. The model can now use this generated data to learn from its weaknesses and enhance its performance on backgrounds that are uncommon for each class of data. Our approach serves as a broad solution, promising progress in comprehending model failure modes and strengthening deep learning models across a wide range of failure scenarios (e.g. bacckgrounds, colors) automatically in a few-shot manner. Our experiments have shown remarkable \textbf{improvements in accuracy ($\sim \textbf{21%}$)} on hard sub-populations (particularly for wrong background association) across $40$ different models, such as ResNets, EfficientNets, DenseNets, Vision Transformer (ViT), SwAVs, MoCos, DINOs, and CLIPs on various datasets such as ImageNet-1000, CIFAR-10, and CIFAR-100.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. A closer look at memorization in deep networks. In International conference on machine learning, pp.  233–242. PMLR, 2017.
  2. Leaving reality to imagination: Robust classification via generated datasets. arXiv preprint arXiv:2302.02503, 2023.
  3. Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in neural information processing systems, 32, 2019.
  4. Unsupervised learning of visual features by contrasting cluster assignments. In European Conference on Computer Vision, pp.  116–132. Springer, 2020.
  5. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  9650–9660, 2021.
  6. 3d object proposals for accurate object class detection. In Advances in Neural Information Processing Systems, pp.  2146–2156, 2018.
  7. Slice finder: Automated data slicing for model validation. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp.  1550–1553. IEEE, 2019.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10687–10698, 2021.
  9. Domino: Discovering systematic errors with cross-modal embeddings. arXiv preprint arXiv:2203.14960, 2022.
  10. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  770–778, 2016.
  11. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  9729–9738, 2019.
  12. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
  13. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  8340–8349, 2021a.
  14. Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15262–15271, 2021b.
  15. Denoising diffusion probabilistic models. In International Conference on Machine Learning, 2021.
  16. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  4700–4708, 2017.
  17. Distilling model failures as directions in latent space. arXiv preprint arXiv:2206.14754, 2022.
  18. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In Advances in Neural Information Processing Systems, pp.  10154–10163, 2018.
  19. Focus: Familiar objects in common and uncommon settings. arXiv preprint arXiv:2110.03804, 2021.
  20. Invariant learning via diffusion dreamed distribution shifts. arXiv preprint arXiv:2211.10370, 2022.
  21. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Advances in Neural Information Processing Systems, pp.  13–23, 2019.
  22. Fast model editing at scale. arXiv preprint arXiv:2110.11309, 2021.
  23. Neural modular network for visual reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13306–13315, 2020.
  24. Towards accountable ai: Hybrid human-machine analyses for characterizing system failure. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 6, pp.  126–135, 2018.
  25. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  26. Do vision transformers see like convolutional neural networks? Advances in Neural Information Processing Systems, 34:12116–12128, 2021.
  27. Diverse weight averaging for out-of-distribution generalization. arXiv preprint arXiv:2205.09739, 2022.
  28. Training deep neural networks on noisy labels with bootstrapping. In International Conference on Learning Representations (ICLR), 2015.
  29. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp.  1135–1144, 2016.
  30. High-resolution image synthesis with latent diffusion models. 2022 ieee. In CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  10674–10685, 2021.
  31. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019.
  32. Editing a classifier by rewriting its prediction rules. Advances in Neural Information Processing Systems, 34:23359–23373, 2021.
  33. Salient imagenet: How to discover spurious features in deep learning? arXiv preprint arXiv:2110.04301, 2021.
  34. Understanding failures of deep networks via robust feature extraction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12853–12862, 2021.
  35. Data-centric debugging: mitigating model failures via targeted data collection. arXiv preprint arXiv:2211.09859, 2022.
  36. Training convolutional networks with noisy labels. In Advances in Neural Information Processing Systems (NIPS), pp.  468–476, 2014.
  37. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946, 2019.
  38. Discovering bugs in vision models using off-the-shelf image generation and captioning. arXiv preprint arXiv:2208.08831, 2022.
  39. Leveraging sparse linear layers for debuggable deep networks. In International Conference on Machine Learning, pp.  11205–11216. PMLR, 2021.
  40. Errudite: Scalable, reproducible, and testable error analysis. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.  747–763, 2019.
  41. Noise or signal: The role of image backgrounds in object recognition. arXiv preprint arXiv:2006.09994, 2020.
  42. Local features and kernels for classification of texture and object categories: A comprehensive study. International journal of computer vision, 73:213–238, 2007.
  43. Manifold: A model-agnostic framework for interpretation and diagnosis of machine learning models. IEEE transactions on visualization and computer graphics, 25(1):364–373, 2018a.
  44. Multi-label learning with missing labels: A probabilistic perspective. IEEE Transactions on Knowledge and Data Engineering, 30(3):504–517, 2018b.
  45. Towards mitigating bias in deep learning-based automated skin lesion classification with balanced groups and label correction. IEEE Journal of Biomedical and Health Informatics, 24(10):2803–2813, 2020.
Citations (3)

Summary

We haven't generated a summary for this paper yet.