Counterfactual Co-occurring Learning for Bias Mitigation in Weakly-supervised Object Localization (2305.15354v2)
Abstract: Contemporary weakly-supervised object localization (WSOL) methods have primarily focused on addressing the challenge of localizing the most discriminative region while largely overlooking the relatively less explored issue of biased activation -- incorrectly spotlighting co-occurring background with the foreground feature. In this paper, we conduct a thorough causal analysis to investigate the origins of biased activation. Based on our analysis, we attribute this phenomenon to the presence of co-occurring background confounders. Building upon this profound insight, we introduce a pioneering paradigm known as Counterfactual Co-occurring Learning (CCL), meticulously engendering counterfactual representations by adeptly disentangling the foreground from the co-occurring background elements. Furthermore, we propose an innovative network architecture known as Counterfactual-CAM. This architecture seamlessly incorporates a perturbation mechanism for counterfactual representations into the vanilla CAM-based model. By training the WSOL model with these perturbed representations, we guide the model to prioritize the consistent foreground content while concurrently reducing the influence of distracting co-occurring backgrounds. To the best of our knowledge, this study represents the initial exploration of this research direction. Our extensive experiments conducted across multiple benchmarks validate the effectiveness of the proposed Counterfactual-CAM in mitigating biased activation.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in arXiv, 2014.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
- Y. Luo, P. Liu, L. Zheng, T. Guan, J. Yu, and Y. Yang, “Category-level adversarial adaptation for semantic segmentation using purified features,” TPAMI, 2021.
- Y. Luo, Z. Zheng, L. Zheng, T. Guan, J. Yu, and Y. Yang, “Macro-micro adversarial network for human parsing,” in ECCV, 2018.
- F. Shao, Y. Luo, P. Liu, J. Chen, Y. Yang, Y. Lu, and J. Xiao, “Active learning for point cloud semantic segmentation via spatial-structural diversity reasoning,” arXiv, 2022.
- Y. Luo and Y. Yang, “large language model and domain-specific model collaboration for smart education,” FITEE, 2024.
- Y. Song, X. Yang, Y. Wang, and C. Xu, “Recovering generalization via pre-training-like knowledge distillation for out-of-distribution visual question answering,” TMM, 2023.
- S. Wu, G. Zhao, and X. Qian, “Resolving zero-shot and fact-based visual question answering via enhanced fact retrieval,” TMM, 2023.
- Z. Wen, S. Niu, G. Li, Q. Wu, M. Tan, and Q. Wu, “Test-time model adaptation for visual question answering with debiased self-supervisions,” TMM, 2023.
- L. Li, L. Chen, Y. Huang, Z. Zhang, S. Zhang, and J. Xiao, “The devil is in the labels: Noisy label correction for robust scene graph generation,” in CVPR, 2022.
- L. Li, J. Xiao, H. Shi, W. Wang, J. Shao, A.-A. Liu, Y. Yang, and L. Chen, “Label semantic knowledge distillation for unbiased scene graph generation,” TSCVT, 2023.
- Y. Zhang, Y. Pan, T. Yao, R. Huang, T. Mei, and C.-W. Chen, “End-to-end video scene graph generation with temporal propagation transformer,” TMM, 2023.
- P. Zhu, X. Wang, L. Zhu, Z. Sun, W.-S. Zheng, Y. Wang, and C. Chen, “Prompt-based learning for unpaired image captioning,” TMM, 2023.
- W. Zhao and X. Wu, “Boosting entity-aware image captioning with multi-modal knowledge graph,” TMM, 2023.
- S. Jing, H. Zhang, P. Zeng, L. Gao, J. Song, and H. T. Shen, “Memory-based augmentation network for video captioning,” TMM, 2023.
- Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object detection with deep learning: A review,” TNNLS, 2019.
- L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, and M. Pietikäinen, “Deep learning for generic object detection: A survey,” IJCV, 2020.
- T. Chen, X. Hu, J. Xiao, G. Zhang, and S. Wang, “Binet: Bidirectional interactive network for salient object detection,” Neurocomputing, 2021.
- X. Fang, J. Zhu, R. Zhang, X. Shao, and H. Wang, “Ibnet: Interactive branch network for salient object detection,” Neurocomputing, 2021.
- M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” IJCV, 2010.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” IJCV, 2015.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in ECCV, 2014.
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in ICCV, 2017.
- A. Diba, V. Sharma, A. Pazandeh, H. Pirsiavash, and L. Van Gool, “Weakly supervised cascaded convolutional networks,” in CVPR, 2017.
- F. Shao, Y. Luo, L. Zhang, L. Ye, S. Tang, Y. Yang, and J. Xiao, “Improving weakly supervised object localization via causal intervention,” in ACM MM, 2021.
- J. Xie, C. Luo, X. Zhu, Z. Jin, W. Lu, and L. Shen, “Online refinement of low-level feature based activation map for weakly supervised object localization,” in ICCV, 2021.
- E. Kim, S. Kim, J. Lee, H. Kim, and S. Yoon, “Bridging the gap between classification and localization for weakly supervised object localization,” in CVPR, 2022.
- P. Wu, W. Zhai, and Y. Cao, “Background activation suppression for weakly supervised object localization,” in CVPR, 2022.
- F. Shao, Y. Luo, S. Wu, Q. Li, F. Gao, Y. Yang, and J. Xiao, “Further improving weakly-supervised object localization via causal knowledge distillation,” arXiv, 2023.
- B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in CVPR, 2016.
- D. Zhang, H. Zhang, J. Tang, X. Hua, and Q. Sun, “Causal intervention for weakly-supervised semantic segmentation,” arXiv, 2020.
- N. Kallus, X. Mao, and M. Uehara, “Causal inference under unmeasured confounding with negative controls: A minimax learning approach,” arXiv, 2021.
- I. Díaz and M. J. van der Laan, “Sensitivity analysis for causal inference under unmeasured confounding and measurement error problems,” The international journal of biostatistics, 2013.
- X. Zhang, D. E. Faries, H. Li, J. D. Stamey, and G. W. Imbens, “Addressing unmeasured confounding in comparative observational research,” Pharmacoepidemiology and drug safety, 2018.
- F. Shao, L. Chen, J. Shao, W. Ji, S. Xiao, L. Ye, Y. Zhuang, and J. Xiao, “Deep learning for weakly-supervised object detection and localization: A survey,” Neurocomputing, 2022.
- D. Kim, D. Cho, D. Yoo, and I. So Kweon, “Two-phase learning for weakly supervised object localization,” in ICCV, 2017.
- X. Zhang, Y. Wei, J. Feng, Y. Yang, and T. S. Huang, “Adversarial complementary learning for weakly supervised object localization,” in CVPR, 2018.
- S. Babar and S. Das, “Where to look?: Mining complementary image regions for weakly supervised object localization,” in WACV, 2021.
- G. Guo, J. Han, F. Wan, and D. Zhang, “Strengthen learning tolerance for weakly supervised object localization,” in CVPR, 2021.
- J. Wei, Q. Wang, Z. Li, S. Wang, S. K. Zhou, and S. Cui, “Shallow feature matters for weakly supervised object localization,” in CVPR, 2021.
- D. Li, J.-B. Huang, Y. Li, S. Wang, and M.-H. Yang, “Weakly supervised object localization with progressive domain adaptation,” in CVPR, 2016.
- Y. Wei, Z. Shen, B. Cheng, H. Shi, J. Xiong, J. Feng, and T. Huang, “Ts2c: Tight box mining with surrounding segmentation context for weakly supervised object detection,” in ECCV, 2018.
- J. Choe and H. Shim, “Attention-based dropout layer for weakly supervised object localization,” in CVPR, 2019.
- J. Mai, M. Yang, and W. Luo, “Erasing integrated learning: A simple yet effective approach for weakly supervised object localization,” in CVPR, 2020.
- Z. Yue, H. Zhang, Q. Sun, and X.-S. Hua, “Interventional few-shot learning,” arXiv, 2020.
- K. Tang, J. Huang, and H. Zhang, “Long-tailed classification by keeping the good and removing the bad momentum causal effect,” arXiv, 2020.
- K. Tang, Y. Niu, J. Huang, J. Shi, and H. Zhang, “Unbiased scene graph generation from biased training,” in CVPR, 2020.
- L. Chen, X. Yan, J. Xiao, H. Zhang, S. Pu, and Y. Zhuang, “Counterfactual samples synthesizing for robust visual question answering,” in CVPR, 2020.
- Y. Rao, G. Chen, J. Lu, and J. Zhou, “Counterfactual attention learning for fine-grained visual categorization and re-identification,” in ICCV, 2021.
- F. Zhang, K. Kuang, L. Chen, Y. Liu, C. Wu, and J. Xiao, “Fairness-aware contrastive learning with partially annotated sensitive attributes,” in ICLR, 2023.
- S. Yang, Y. Kim, Y. Kim, and C. Kim, “Combinational class activation maps for weakly supervised object localization,” in WACV, 2020.
- Y. Iwasawa and Y. Matsuo, “Test-time classifier adjustment module for model-agnostic domain generalization,” NeurIPS, 2021.
- K. Tanwisuth, X. Fan, H. Zheng, S. Zhang, H. Zhang, B. Chen, and M. Zhou, “A prototype-oriented framework for unsupervised domain adaptation,” NeurIPS, 2021.
- D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell, “Tent: Fully test-time adaptation by entropy minimization,” arXiv, 2020.
- C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset,” 2011.
- J. Choe, S. J. Oh, S. Lee, S. Chun, Z. Akata, and H. Shim, “Evaluating weakly supervised object localization methods right,” in CVPR, 2020.
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in CVPR, 2016.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, 2014.
- C.-L. Zhang, Y.-H. Cao, and J. Wu, “Rethinking the route towards weakly supervised object localization,” in CVPR, 2020.
- S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in ICCV, 2019.
- W. Lu, X. Jia, W. Xie, L. Shen, Y. Zhou, and J. Duan, “Geometry constrained weakly supervised object localization,” in ECCV, 2020.
- W. Bae, J. Noh, and G. Kim, “Rethinking class activation mapping for weakly supervised object localization,” in ECCV, 2020.
- X. Pan, Y. Gao, Z. Lin, F. Tang, W. Dong, H. Yuan, F. Huang, and C. Xu, “Unveiling the potential of structure preserving for weakly supervised object localization,” in CVPR, 2021.
- J. Xu, J. Hou, Y. Zhang, R. Feng, R.-W. Zhao, T. Zhang, X. Lu, and S. Gao, “Cream: Weakly supervised object localization via class re-activation mapping,” in CVPR, 2022.
- X. Zhang, Y. Wei, and Y. Yang, “Inter-image communication for weakly supervised localization,” in ECCV, 2020.
- J. Keeler, D. Rumelhart, and W. Leow, “Integrated segmentation and recognition of hand-printed numerals,” NeurIPS, 1990.
- G. Papandreou, L.-C. Chen, K. P. Murphy, and A. L. Yuille, “Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation,” in ICCV, 2015.
- S. Joon Oh, R. Benenson, A. Khoreva, Z. Akata, M. Fritz, and B. Schiele, “Exploiting saliency for object segmentation from image level labels,” in CVPR, 2017.
- R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in CVPR, 2014.