RGB-D And Thermal Sensor Fusion: A Systematic Literature Review (2305.11427v2)
Abstract: In the last decade, the computer vision field has seen significant progress in multimodal data fusion and learning, where multiple sensors, including depth, infrared, and visual, are used to capture the environment across diverse spectral ranges. Despite these advancements, there has been no systematic and comprehensive evaluation of fusing RGB-D and thermal modalities to date. While autonomous driving using LiDAR, radar, RGB, and other sensors has garnered substantial research interest, along with the fusion of RGB and depth modalities, the integration of thermal cameras and, specifically, the fusion of RGB-D and thermal data, has received comparatively less attention. This might be partly due to the limited number of publicly available datasets for such applications. This paper provides a comprehensive review of both, state-of-the-art and traditional methods used in fusing RGB-D and thermal camera data for various applications, such as site inspection, human tracking, fault detection, and others. The reviewed literature has been categorised into technical areas, such as 3D reconstruction, segmentation, object detection, available datasets, and other related topics. Following a brief introduction and an overview of the methodology, the study delves into calibration and registration techniques, then examines thermal visualisation and 3D reconstruction, before discussing the application of classic feature-based techniques as well as modern deep learning approaches. The paper concludes with a discourse on current limitations and potential future research directions. It is hoped that this survey will serve as a valuable reference for researchers looking to familiarise themselves with the latest advancements and contribute to the RGB-DT research field.
- K. Skala, T. Lipić, I. Sović, L. Gjenero, and I. Grubišić, “4d thermal imaging system for medical applications,” Periodicum biologorum, vol. 113, no. 4, pp. 407–416, 2011.
- M. J. Page, J. E. McKenzie, P. M. Bossuyt, I. Boutron, T. C. Hoffmann, C. D. Mulrow, L. Shamseer, J. M. Tetzlaff, E. A. Akl, S. E. Brennan, R. Chou, J. Glanville, J. M. Grimshaw, A. Hróbjartsson, M. M. Lalu, T. Li, E. W. Loder, E. Mayo-Wilson, S. McDonald, L. A. McGuinness, L. A. Stewart, J. Thomas, A. C. Tricco, V. A. Welch, P. Whiting, and D. Moher, “The prisma 2020 statement: an updated guideline for reporting systematic reviews,” Systematic Reviews, vol. 10, no. 1, p. 89, 2021. [Online]. Available: https://doi.org/10.1186/s13643-021-01626-4
- X. Zhang, Y. Zhang, J. Geng, J. Pan, X. Huang, and X. Rao, “Feather damage monitoring system using rgb-depth-thermal model for chickens,” Animals, vol. 13, no. 1, p. 126, 2023.
- K. Song, J. Wang, Y. Bao, L. Huang, and Y. Yan, “A novel visible-depth-thermal image dataset of salient object detection for robotic visual perception,” IEEE/ASME Transactions on Mechatronics, 2022.
- S. Yoon and J. Cho, “Deep multimodal detection in reduced visibility using thermal depth estimation for autonomous driving,” Sensors, vol. 22, no. 14, p. 5084, 2022.
- W. Mucha and M. Kampel, “Depth and thermal images in face detection-a detailed comparison between image modalities,” in 2022 the 5th International Conference on Machine Vision and Applications (ICMVA), 2022, Conference Proceedings, pp. 16–21.
- M. Oppliger, J. Gutknecht, R. Gubler, M. Ludwig, and T. Loeliger, “Sensor fusion of 3d time-of-flight and thermal infrared camera for presence detection of living beings,” in 2022 IEEE Sensors. IEEE, 2022, Conference Proceedings, pp. 1–4.
- A. Ozcan and O. Cetin, “A novel fusion method with thermal and rgb-d sensor data for human detection,” IEEE Access, vol. 10, pp. 66 831–66 843, 2022.
- T. Zhang, L. Hu, Y. Sun, L. Li, and D. Navarro-Alarcon, “Computing thermal point clouds by fusing rgb-d and infrared images: From dense object reconstruction to environment mapping,” in 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2022, Conference Proceedings, pp. 1707–1714.
- R. Qiu, Y. Miao, M. Zhang, and H. Li, “Detection of the 3d temperature characteristics of maize under water stress using thermal and rgb-d cameras,” Computers and Electronics in Agriculture, vol. 191, p. 106551, 2021.
- W. Gutfeter and A. Pacut, “Fusion of depth and thermal imaging for people detection,” Journal of Telecommunications and Information Technology, 2021.
- E. Semenishchev, V. Voronin, S. Agaian, M. Zhdanova, and A. Zelensky, “Algorithm for fusing data obtained by thermal, 3d, and the visible range cameras,” in Dimensional Optical Metrology and Inspection for Practical Applications X, vol. 11732. SPIE, 2021, Conference Proceedings, pp. 105–111.
- Y. Zefri, I. Sebari, H. Hajji, and G. Aniba, “In-depth investigation of applied digital photogrammetry to imagery-based rgb and thermal infrared aerial inspection of large-scale photovoltaic installations,” Remote Sensing Applications: Society and Environment, vol. 23, p. 100576, 2021.
- R. Yan, K. Yang, and K. Wang, “Nlfnet: non-local fusion towards generalized multimodal semantic segmentation across rgb-depth, polarization, and thermal images,” in 2021 IEEE international conference on robotics and biomimetics (ROBIO). IEEE, 2021, Conference Proceedings, pp. 1129–1135.
- M. Ortega, E. Ivorra, A. Juan, P. Venegas, J. Martínez, and M. Alcañiz, “Mantra: An effective system based on augmented reality and infrared thermography for industrial maintenance,” Applied Sciences, vol. 11, no. 1, p. 385, 2021.
- Y. Cao, Y. Dong, F. Wang, J. Yang, Y. Cao, and X. Li, “Multi-sensor spatial augmented reality for visualizing the invisible thermal information of 3d objects,” Optics and Lasers in Engineering, vol. 145, p. 106634, 2021.
- H. Zheng, X. Zhong, J. Yan, L. Zhao, and X. Wang, “A thermal performance detection method for building envelope based on 3d model generated by uav thermal imagery,” Energies, vol. 13, no. 24, p. 6677, 2020.
- M. Jarząbek-Rychard, D. Lin, and H.-G. Maas, “Supervised detection of façade openings in 3d point clouds with thermal attributes,” Remote Sensing, vol. 12, no. 3, p. 543, 2020.
- I. Halima, J.-M. Laferté, G. Cormier, A.-J. Fougères, and J.-L. Dillenseger, “Depth and thermal information fusion for head tracking using particle filter in a fall detection context,” Integrated Computer-Aided Engineering, vol. 27, no. 2, pp. 195–208, 2020.
- Z. Zhao, J. Zhang, and S. Shan, “Noise robust hard example mining for human detection with efficient depth-thermal fusion,” in 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 2020, Conference Proceedings, pp. 809–813.
- F. Javadnejad, D. T. Gillins, C. E. Parrish, and R. K. Slocum, “A photogrammetric approach to fusing natural colour and thermal infrared uas imagery in 3d point cloud generation,” International Journal of Remote Sensing, vol. 41, no. 1, pp. 211–237, 2020.
- Y. Shi, P. Payeur, M. Frize, and E. Bariciak, “Thermal and rgb-d imaging for necrotizing enterocolitis detection,” in 2020 IEEE International Symposium on Medical Measurements and Applications (MeMeA). IEEE, 2020, Conference Proceedings, pp. 1–6.
- C. Yu and A. Tapus, “Multimodal emotion recognition with thermal and rgb-d cameras for human-robot interaction,” in Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, 2020, Conference Proceedings, pp. 532–534.
- F. Zhao, S. Cosar, N. Bellotto, and S. Yue, “Roi detection and tracking for physiological monitoring based on calibration between rgb-d and thermal cameras,” 2020.
- A. George and S. Marcel, “Learning one class representations for face presentation attack detection using multi-channel convolutional neural networks,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 361–375, 2020.
- G. Heusch, A. George, D. Geissbühler, Z. Mostaani, and S. Marcel, “Deep models and shortwave infrared information to detect face presentation attacks,” IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020. [Online]. Available: http://publications.idiap.ch/downloads/papers/2020/Heusch_TBIOM_2020.pdf
- D. Li, C. Menassa, and V. R. Kamat, “Thermal and rgb-d sensor fusion for non-intrusive human thermal comfort assessment,” in CIB World Building Congress 2019, Hong Kong, 2019, Conference Proceedings.
- X. Chen, G. Tian, J. Wu, C. Tang, and K. Li, “Feature-based registration for 3d eddy current pulsed thermography,” ieee sensors journal, vol. 19, no. 16, pp. 6998–7004, 2019.
- A. George, Z. Mostaani, D. Geissenbuhler, O. Nikisins, A. Anjos, and S. Marcel, “Biometric face presentation attack detection with multi-channel convolutional neural network,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 42–55, 2019.
- S. Brahmbhatt, C. Ham, C. C. Kemp, and J. Hays, “Contactdb: Analyzing and predicting grasp contact via thermal imaging,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8709–8719.
- C. Song and S.-H. Kim, “Robust vehicle occupant detection based on rgb-depth-thermal camera,” The Journal of Korea Robotics Society, vol. 13, no. 1, pp. 31–37, 2018.
- D.-O. Iacob and A. Tapus, “First attempts in deception detection in hri by using thermal and rgb-d cameras,” in RO-MAN 2018, 2018, Conference Proceedings.
- M. Sorostinean and A. Tapus, “Activity recognition based on rgb-d and thermal sensors for socially assistive robots,” in 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV). IEEE, 2018, Conference Proceedings, pp. 1298–1304.
- M.-D. Yang, T.-C. Su, and H.-Y. Lin, “Fusion of infrared thermal image and visible image for 3d thermal model reconstruction using smartphone sensors,” Sensors, vol. 18, no. 7, p. 2003, 2018.
- Y. Cao, B. Xu, Z. Ye, J. Yang, Y. Cao, C.-L. Tisse, and X. Li, “Depth and thermal sensor fusion to enhance 3d thermographic reconstruction,” Optics express, vol. 26, no. 7, pp. 8179–8193, 2018.
- R. Luo, O. Sener, and S. Savarese, “Scene semantic reconstruction from egocentric rgb-d-thermal videos,” in 2017 International Conference on 3D Vision (3DV). IEEE, 2017, Conference Proceedings, pp. 593–602.
- I. R. Spremolla, M. Antunes, D. Aouada, and B. E. Ottersten, “Rgb-d and thermal sensor fusion-application in person tracking,” in VISIGRAPP (3: VISAPP), 2016, Conference Proceedings, pp. 612–619.
- C. Palmero, A. Clapés, C. Bahnsen, A. Møgelmose, T. B. Moeslund, and S. Escalera, “Multi-modal rgb–depth–thermal human body segmentation,” International Journal of Computer Vision, vol. 118, pp. 217–239, 2016.
- W. Nakagawa, K. Matsumoto, F. de Sorbier, M. Sugimoto, H. Saito, S. Senda, T. Shibata, and A. Iketani, “Visualization of temperature change using rgb-d camera and thermal camera,” in Computer Vision-ECCV 2014 Workshops: Zurich, Switzerland, September 6-7 and 12, 2014, Proceedings, Part I 13. Springer, 2014, Conference Proceedings, pp. 386–400.
- R. Irani, K. Nasrollahi, M. O. Simon, C. A. Corneanu, S. Escalera, C. Bahnsen, D. H. Lundtoft, T. B. Moeslund, T. L. Pedersen, M.-L. Klitgaard et al., “Spatiotemporal analysis of rgb-dt facial images for multimodal pain level recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 88–95.
- J. Rangel, J. Garzón, J. Sofrony, and A. Kroll, “Gas leak inspection using thermal, visual and depth images and a depth-enhanced gas detection strategy,” Revista de Ingeniería, vol. 42, no. 1, pp. 8–15, 2015.
- S. Vidas, P. Moghadam, and S. Sridharan, “Real-time mobile 3d temperature mapping,” IEEE Sensors Journal, vol. 15, no. 2, pp. 1145–1152, 2014.
- J. Rangel, S. Soldan, and A. Kroll, “3d thermal imaging: Fusion of thermography and depth cameras,” in International Conference on Quantitative InfraRed Thermography, vol. 3, 2014.
- S. Vidas, P. Moghadam, and M. Bosse, “3d thermal mapping of building interiors using an rgb-d and thermal camera,” in 2013 IEEE international conference on robotics and automation. IEEE, 2013, Conference Proceedings, pp. 2311–2318.
- L. Susperregi, J. M. Martínez-Otzeta, A. Ansuategui, A. Ibarguren, and B. Sierra, “Rgb-d, laser and thermal sensor fusion for people following in a mobile robot,” International Journal of Advanced Robotic Systems, vol. 10, no. 6, p. 271, 2013.
- A. Mogelmose, C. Bahnsen, T. Moeslund, A. Clapés, and S. Escalera, “Tri-modal person re-identification with rgb, depth and thermal features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013, Conference Proceedings, pp. 301–307.
- D. Borrmann, A. Nüchter, M. Đakulović, I. Maurović, I. Petrović, D. Osmanković, and J. Velagić, “The project thermalmapper–thermal 3d mapping of indoor environments for saving energy,” IFAC Proceedings Volumes, vol. 45, no. 22, pp. 31–38, 2012.
- Y. Ham and M. Golparvar-Fard, “Rapid 3d energy performance modeling of existing buildings using thermal and digital imagery,” in Construction Research Congress 2012: Construction Challenges in a Flat World, 2019, Conference Proceedings, pp. 991–1000.
- D. Feng, C. Haase-Schütz, L. Rosenbaum, H. Hertlein, C. Glaeser, F. Timm, W. Wiesbeck, and K. Dietmayer, “Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 3, pp. 1341–1360, 2020.
- C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 270–279.
- Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on pattern analysis and machine intelligence, vol. 22, no. 11, pp. 1330–1334, 2000.
- S. Vidas, R. Lakemond, S. Denman, C. Fookes, S. Sridharan, and T. Wark, “A mask-based approach for the geometric calibration of thermal-infrared cameras,” IEEE Transactions on Instrumentation and Measurement, vol. 61, no. 6, pp. 1625–1635, 2012.
- N. Kim, Y. Choi, S. Hwang, K. Park, J. S. Yoon, and I. S. Kweon, “Geometrical calibration of multispectral calibration,” in 2015 12th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI). IEEE, 2015, Conference Proceedings, pp. 384–385.
- T. Zhang, L. Hu, L. Li, and D. Navarro-Alarcon, “Towards a multispectral rgb-ir-uv-d vision system—seeing the invisible in 3d,” in 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2021, pp. 1723–1728.
- G. Bradski and A. Kaehler, “OpenCV library,” https://opencv.org/, 2000, accessed: March 22, 2023.
- G. Ben-Artzi, T. Halperin, M. Werman, and S. Peleg, “Epipolar geometry based on line similarity,” in 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016, pp. 1864–1869.
- R. Istenic, D. Heric, S. Ribaric, and D. Zazula, “Thermal and visual image registration in hough parameter space,” in 2007 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services. IEEE, 2007, pp. 106–109.
- T. M. Inc., “MATLAB,” https://www.mathworks.com/products/matlab.html, 2022, accessed on March 22, 2023.
- D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–110, 2004.
- H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” Lecture notes in computer science, vol. 3951, pp. 404–417, 2006.
- S. Leutenegger, M. Chli, and R. Y. Siegwart, “Brisk: Binary robust invariant scalable keypoints,” in 2011 International conference on computer vision. Ieee, 2011, pp. 2548–2555.
- P. F. Alcantarilla, A. Bartoli, and A. J. Davison, “Kaze features,” in Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI 12. Springer, 2012, pp. 214–227.
- M. J. Sousa, A. Moutinho, and M. Almeida, “Thermal infrared sensing for near real-time data-driven fire detection and monitoring systems,” Sensors, vol. 20, no. 23, p. 6803, 2020.
- J. Nie, J. Yan, H. Yin, L. Ren, and Q. Meng, “A multimodality fusion deep neural network and safety test strategy for intelligent vehicles,” IEEE transactions on intelligent vehicles, vol. 6, no. 2, pp. 310–322, 2020.
- M. P. Muresan, I. Giosan, and S. Nedevschi, “Stabilization and validation of 3d object position using multimodal sensor fusion and semantic segmentation,” Sensors, vol. 20, no. 4, p. 1110, 2020.
- M. Kowalski and K. Mierzejewski, “Detection of 3d face masks with thermal infrared imaging and deep learning techniques,” Photonics Letters of Poland, vol. 13, no. 2, pp. 22–24, 2021.
- D.-E. Kim, B. Jeon, and D.-S. Kwon, “3d convolutional neural networks based fall detection with thermal camera,” The Journal of Korea Robotics Society, vol. 13, no. 1, pp. 45–54, 2018.
- T. Baltrusaitis, P. Robinson, and L.-P. Morency, “Constrained local neural fields for robust facial landmark detection in the wild,” in Proceedings of the IEEE international conference on computer vision workshops, 2013, pp. 354–361.
- D. E. King, “Dlib-ml: A machine learning toolkit,” The Journal of Machine Learning Research, vol. 10, pp. 1755–1758, 2009.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
- S. Ullman, “The interpretation of structure from motion,” Proceedings of the Royal Society of London. Series B. Biological Sciences, vol. 203, no. 1153, pp. 405–426, 1979.
- R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Real-time dense surface mapping and tracking,” in 2011 10th IEEE international symposium on mixed and augmented reality. Ieee, 2011, pp. 127–136.
- A. Dai, M. Nießner, M. Zollhöfer, S. Izadi, and C. Theobalt, “Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, p. 1, 2017.
- T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J. Leonard, and J. McDonald, “Real-time large-scale dense rgb-d slam with volumetric fusion,” The International Journal of Robotics Research, vol. 34, no. 4-5, pp. 598–626, 2015.
- M. Bloesch, J. Czarnowski, R. Clark, S. Leutenegger, and A. J. Davison, “Codeslam—learning a compact, optimisable representation for dense visual slam,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2560–2568.
- M. Landmann, S. Heist, P. Dietrich, P. Lutzke, I. Gebhart, J. Templin, P. Kühmstedt, A. Tünnermann, and G. Notni, “High-speed 3d thermography,” Optics and Lasers in Engineering, vol. 121, pp. 448–455, 2019.
- J. Sun, H. Ma, and D. Zeng, “Three-dimensional infrared imaging method based on binocular stereo vision,” Optical Engineering, vol. 54, no. 10, pp. 103 111–103 111, 2015.
- The Open3D development team, “Open3d: A modern library for 3d data processing,” 2023, version 0.13.0. Available at: http://www.open3d.org/. [Online]. Available: http://www.open3d.org/
- G. Farnebäck, “Two-frame motion estimation based on polynomial expansion,” in Image Analysis: 13th Scandinavian Conference, SCIA 2003 Halmstad, Sweden, June 29–July 2, 2003 Proceedings 13. Springer, 2003, pp. 363–370.
- J. M. Martínez-Otzeta, A. Ibarguren, A. Ansuategi, and L. Susperregi, “Laser based people following behaviour in an emergency environment,” in Intelligent Robotics and Applications: Second International Conference, ICIRA 2009, Singapore, December 16-18, 2009. Proceedings 2. Springer, 2009, pp. 33–42.
- N. Bellotto, H. Hu et al., “A bank of unscented kalman filters for multimodal human perception with mobile service robots,” International Journal of Social Robotics, vol. 2, no. 2, pp. 121–136, 2010.
- K. J. Cannons and R. P. Wildes, “The applicability of spatiotemporal oriented energy features to region tracking,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 4, pp. 784–796, 2013.
- R. Irani, K. Nasrollahi, and T. B. Moeslund, “Pain recognition using spatiotemporal oriented energy of facial muscles,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2015, pp. 80–87.
- R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
- O. Şener, K. Ugur, and A. A. Alatan, “Efficient mrf energy propagation for video segmentation via bilateral filters,” IEEE transactions on multimedia, vol. 16, no. 5, pp. 1292–1302, 2014.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in Conference on Neural Information Processing Systems (NeurIPS), 2020.
- Y. Liu, D.-P. Fan, M.-M. Cheng, T. Li, and A. Borji, “Aanet: Adaptive aggregation network for efficient stereo matching,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 1474–1483.
- X. Gu, Z. Fan, S. Zhu, Z. Dai, F. Tan, and P. Tan, “Cascade cost volume for high-resolution multi-view stereo and stereo matching,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2495–2504.
- Q. Ha, K. Watanabe, T. Karasawa, Y. Ushiku, and T. Harada, “Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 5108–5115.
- Y. Li, A. W. Yu, T. Meng, B. Caine, J. Ngiam, D. Peng, J. Shen, Y. Lu, D. Zhou, Q. V. Le et al., “Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 182–17 191.
- C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, “Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture,” in Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part I 13. Springer, 2017, Conference Proceedings, pp. 213–228.
- J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-speed tracking with kernelized correlation filters,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 3, pp. 583–596, 2014.
- Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1302–1310.
- J. Vertens, J. Zürn, and W. Burgard, “Heatnet: Bridging the day-night domain gap in semantic segmentation with thermal images,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, Conference Proceedings, pp. 8461–8468.
- K. Xiang, K. Yang, and K. Wang, “Polarization-driven semantic segmentation via efficient attention-bridged fusion,” Optics Express, vol. 29, no. 4, pp. 4802–4820, 2021.
- J. Zhang, K. Yang, and R. Stiefelhagen, “Issafe: Improving semantic segmentation in accidents by fusing event-based data,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 1132–1139.
- H. Zhou, L. Qi, Z. Wan, H. Huang, and X. Yang, “Rgb-d co-attention network for semantic segmentation,” in Proceedings of the Asian conference on computer vision, 2020, Conference Proceedings.
- G. Zhang, J.-H. Xue, P. Xie, S. Yang, and G. Wang, “Non-local aggregation for rgb-d semantic segmentation,” IEEE Signal Processing Letters, vol. 28, pp. 658–662, 2021.
- J. Jiang, L. Zheng, F. Luo, and Z. Zhang, “Rednet: Residual encoder-decoder network for indoor rgb-d semantic segmentation,” arXiv preprint arXiv:1806.01054, 2018.
- S. S. Shivakumar, N. Rodrigues, A. Zhou, I. D. Miller, V. Kumar, and C. J. Taylor, “Pst900: Rgb-thermal calibration, dataset and segmentation network,” in 2020 IEEE international conference on robotics and automation (ICRA). IEEE, 2020, Conference Proceedings, pp. 9441–9447.
- Y.-H. Kim, U. Shin, J. Park, and I. S. Kweon, “Ms-uda: Multi-spectral unsupervised domain adaptation for thermal image semantic segmentation,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 6497–6504, 2021.
- C. Li, W. Xia, Y. Yan, B. Luo, and J. Tang, “Segmenting objects in day and night: Edge-conditioned cnn for thermal image semantic segmentation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 7, pp. 3069–3082, 2020.
- Q. Zhang, S. Zhao, Y. Luo, D. Zhang, N. Huang, and J. Han, “Abmdrnet: Adaptive-weighted bi-directional modality difference reduction network for rgb-t semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, Conference Proceedings, pp. 2633–2642.
- H. Liu, J. Zhang, K. Yang, X. Hu, and R. Stiefelhagen, “Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers,” arXiv preprint arXiv:2203.04838, 2022.
- X. Hu, K. Yang, L. Fei, and K. Wang, “Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation,” in 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019, pp. 1440–1444.
- X. Chen, K.-Y. Lin, J. Wang, W. Wu, C. Qian, H. Li, and G. Zeng, “Bi-directional cross-modality feature propagation with separation-and-aggregation gate for rgb-d semantic segmentation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI. Springer, 2020, pp. 561–577.
- Q. Zhang, S. Zhao, Y. Luo, D. Zhang, N. Huang, and J. Han, “Abmdrnet: Adaptive-weighted bi-directional modality difference reduction network for rgb-t semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2633–2642.
- M. Orsic, I. Kreso, P. Bevandic, and S. Segvic, “In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 607–12 616.
- L. Sun, K. Yang, X. Hu, W. Hu, and K. Wang, “Real-time fusion network for rgb-d semantic segmentation incorporating unexpected obstacle detection for road-driving images,” IEEE robotics and automation letters, vol. 5, no. 4, pp. 5558–5565, 2020.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881–2890.
- X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7794–7803.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
- S. Gupta, J. Hoffman, and J. Malik, “Cross modal distillation for supervision transfer,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2827–2836.
- X. Wu, R. He, Z. Sun, and T. Tan, “A light cnn for deep face representation with noisy labels,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 11, pp. 2884–2896, 2018.
- Y. Choi, N. Kim, S. Hwang, K. Park, J. S. Yoon, K. An, and I. S. Kweon, “Kaist multi-spectral day/night data set for autonomous and assisted driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 3, pp. 934–948, 2018.
- Z. Mostaani, A. George, G. Heusch, D. Geissenbuhler, and S. Marcel, “The high-quality wide multi-channel attack (hq-wmca) database,” Idiap, Idiap-RR Idiap-RR-22-2020, 9 2020.
- A. Clapés, J. C. J. Junior, C. Morral, and S. Escalera, “Chalearn lap 2020 challenge on identity-preserved human detection: Dataset and results,” in 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 2020, pp. 801–808.
- N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images.” ECCV (5), vol. 7576, pp. 746–760, 2012.
- S. Song, S. P. Lichtenberg, and J. Xiao, “Sun rgb-d: A rgb-d scene understanding benchmark suite,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 567–576.
- I. Armeni, S. Sax, A. R. Zamir, and S. Savarese, “Joint 2d-3d-semantic data for indoor scene understanding,” arXiv preprint arXiv:1702.01105, 2017.
- A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5828–5839.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213–3223.
- D. Gehrig, M. Rüegg, M. Gehrig, J. Hidalgo-Carrió, and D. Scaramuzza, “Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2822–2829, 2021.
- Y. Liao, J. Xie, and A. Geiger, “Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- J. J. Moré, “The levenberg-marquardt algorithm: implementation and theory,” in Numerical Analysis: Proceedings of the Biennial Conference Held at Dundee, June 28–July 1, 1977. Springer, 2006, pp. 105–116.
- F. D. Foresee and M. T. Hagan, “Gauss-newton approximation to bayesian learning,” in Proceedings of international conference on neural networks (ICNN’97), vol. 3. IEEE, 1997, pp. 1930–1935.