Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection (2403.01300v2)

Published 2 Mar 2024 in cs.CV

Abstract: RGBT multispectral pedestrian detection has emerged as a promising solution for safety-critical applications that require day/night operations. However, the modality bias problem remains unsolved as multispectral pedestrian detectors learn the statistical bias in datasets. Specifically, datasets in multispectral pedestrian detection mainly distribute between ROTO (day) and RXTO (night) data; the majority of the pedestrian labels statistically co-occur with their thermal features. As a result, multispectral pedestrian detectors show poor generalization ability on examples beyond this statistical correlation, such as ROTX data. To address this problem, we propose a novel Causal Mode Multiplexer (CMM) framework that effectively learns the causalities between multispectral inputs and predictions. Moreover, we construct a new dataset (ROTX-MP) to evaluate modality bias in multispectral pedestrian detection. ROTX-MP mainly includes ROTX examples not presented in previous datasets. Extensive experiments demonstrate that our proposed CMM framework generalizes well on existing datasets (KAIST, CVC-14, FLIR) and the new ROTX-MP. We will release our new dataset to the public for future research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Towards causal vqa: Revealing and reducing spurious correlations by invariant and covariant semantic editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9690–9698, 2020.
  2. Controlling selection bias in causal inference. In Artificial Intelligence and Statistics, pages 100–108. PMLR, 2012.
  3. Rubi: Reducing unimodal biases for visual question answering. Advances in neural information processing systems, 32, 2019.
  4. From handcrafted to deep features for pedestrian detection: a survey. IEEE transactions on pattern analysis and machine intelligence, 2021.
  5. Towards robust classification model by counterfactual and invariant data generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15212–15221, 2021.
  6. Kaist multi-spectral day/night data set for autonomous and assisted driving. IEEE Transactions on Intelligent Transportation Systems, 19(3):934–948, 2018.
  7. Inc FLIR Systems. Free teledyne flir thermal dataset for algorithm training. https://www.flir.com/oem/adas/adas-dataset-form/, 2021. Accessed: 2022-08-05.
  8. Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies. Advances in Neural Information Processing Systems, 33:3197–3208, 2020.
  9. Pedestrian detection at day/night time with visible and fir cameras: A comparison. Sensors, 16(6):820, 2016.
  10. Deconfounded visual grounding. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 998–1006, 2022.
  11. Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1037–1045, 2015.
  12. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
  13. Llvip: A visible-infrared paired dataset for low-light vision. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3496–3504, 2021.
  14. Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(3):1510–1523, 2022a.
  15. Map: Multispectral adversarial patch to attack person detection. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4853–4857. IEEE, 2022b.
  16. Defending physical adversarial attack on object detection via adversarial patch-feature energy. In Proceedings of the 30th ACM International Conference on Multimedia, pages 1905–1913, 2022c.
  17. Multispectral invisible coating: Laminated visible-thermal physical attack against multispectral object detectors using transparent low-e films. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1151–1159, 2023.
  18. Reducing language biases in visual question answering with visually-grounded question encoder. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 18–34. Springer, 2020.
  19. Show, deconfound and tell: Image captioning with causal inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18041–18050, 2022.
  20. Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644, 2016.
  21. Interventional video grounding with dual contrastive learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2765–2775, 2021.
  22. Counterfactual vqa: A cause-effect look at language bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12700–12710, 2021.
  23. Judea Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669–688, 1995.
  24. The book of why: the new science of cause and effect. Basic books, 2018.
  25. Causal inference in statistics: A primer. John Wiley & Sons, 2016.
  26. Judea Pearl et al. Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress, 19(2):3, 2000.
  27. Cross-modality fusion transformer for multispectral object detection, 2022.
  28. Overcoming language priors in visual question answering with adversarial regularization. Advances in Neural Information Processing Systems, 31, 2018.
  29. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  30. James M Robins. Semantics of causal dag models and the identification of direct and indirect effects. Oxford Statistical Science Series, pages 70–82, 2003.
  31. Donald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322–331, 2005.
  32. Debiasing nlu models via causal intervention and counterfactual reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11376–11384, 2022.
  33. Debiased visual question answering from feature and sample perspectives. Advances in Neural Information Processing Systems, 34:3784–3796, 2021.
  34. Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks. In International Conference on Machine Learning, pages 24043–24055. PMLR, 2022.
  35. Causal attention for vision-language tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9847–9857, 2021.
  36. Discovering the real association: Multimodal causal reasoning in video question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19027–19036, 2023.
  37. Multispectral fusion for object detection with cyclic fuse-and-refine blocks. In 2020 IEEE International conference on image processing (ICIP), pages 276–280. IEEE, 2020.
  38. Weakly aligned cross-modal learning for multispectral pedestrian detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5127–5137, 2019.
  39. Layout-based causal inference for object navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10792–10802, 2023.
  40. Improving multispectral pedestrian detection by addressing modality imbalance problems. In European Conference on Computer Vision, pages 787–803. Springer, 2020.
Citations (3)

Summary

We haven't generated a summary for this paper yet.