Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tracking Passengers and Baggage Items Using Multiple Overhead Cameras at Security Checkpoints (2007.07924v3)

Published 15 Jul 2020 in cs.CV

Abstract: We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios where targets correspond to passengers and their baggage items. We propose a self-supervised learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our SSL approach improves object detection by employing a test-time data augmentation and a regression-based, rotation-invariant pseudo-label refinement technique. Our pseudo-label generation method provides multiple geometrically transformed images as inputs to a convolutional neural network (CNN), regresses the augmented detections generated by the network to reduce localization errors, and then clusters them using the mean-shift algorithm. The self-supervised detector model is used in a single-camera tracking algorithm to generate temporal identifiers for the targets. Our method also incorporates a multiview trajectory association mechanism to maintain consistent temporal identifiers as passengers travel across camera views. An evaluation of detection, tracking, and association performances on videos obtained from multiple overhead cameras in a realistic airport checkpoint environment demonstrates the effectiveness of the proposed approach. Our results show that self-supervision improves object detection accuracy by up to 42% without increasing the inference time of the model. Our multicamera association method achieves up to 89% multiobject tracking accuracy with an average computation time of less than 15 ms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Z. Wu and R. J. Radke, “Real-time airport security checkpoint surveillance using a camera network,” in Comp. Vis. and Pattern Recog. Workshops, 2011.
  2. A. Islam, Y. Zhang, D. Yin, O. Camps, and R. J. Radke, “Correlating belongings with passengers in a simulated airport security checkpoint,” in Int. Conf. on Distributed Smart Cameras, 2018.
  3. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in IEEE Int. Conf. on Comp. Vis., 2017.
  4. W. Liu, D. Anguelov et al., “SSD: single shot multibox detector,” in European Conf. on Comp. Vis., 2016.
  5. S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog., 2018.
  6. G. Lin, A. Milan, C. Shen, and I. Reid, “Refinenet: Multi-path refinement networks for high-resolution semantic segmentation,” in IEEE Conf. on Comp. Vis. and Pattern Recog., 2017.
  7. D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, pp. 603–619, 2002.
  8. P. Bergmann, T. Meinhardt, and L. Leal-Taixé, “Tracking without bells and whistles,” in IEEE/CVF Int. Conf. on Comp. Vis., 2019.
  9. W. Lee, J. Na, and G. Kim, “Multi-task self-supervised object detection via recycling of bounding box annotations,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog., 2019.
  10. S. A. Golestaneh and K. Kitani, “Importance of self-consistency in active learning for semantic segmentation,” in British Mach. Vis. Conf., 2020.
  11. M. Cai, M. Luo, X. Zhong, and H. Chen, “Uncertainty-aware model adaptation for unsupervised cross-domain object detection,” arXiv preprint arXiv:2108.12612, 2021.
  12. Y. Wang, J. Peng, and Z. Zhang, “Uncertainty-aware pseudo label refinery for domain adaptive semantic segmentation,” in IEEE/CVF Int. Conf. on Comp. Vis., 2021.
  13. J. Mao, Q. Yu, Y. Yamakata, and K. Aizawa, “Noisy annotation refinement for object detection,” in British Mach. Vis. Conf., 2021.
  14. I. Radosavovic, P. Dollár, R. Girshick, G. Gkioxari, and K. He, “Data distillation: Towards omni-supervised learning,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog., 2018.
  15. R. Mazzon and A. Cavallaro, “Multi-camera tracking using a multi-goal social force model,” Neurocomputing, vol. 100, pp. 41 – 50, 2013.
  16. S. Zhang, Y. Zhu, and A. Roy-Chowdhury, “Tracking multiple interacting targets in a camera network,” Comp. Vis. and Image Understanding, vol. 134, pp. 64 – 73, 2015.
  17. K. Hong, H. Medeiros, P. J. Shin, and J. Park, “Resource-aware distributed particle filtering for cluster-based object tracking in wireless camera networks,” Int. J. of Sensor Networks, vol. 21, no. 3, pp. 137–156, 2016.
  18. H. Medeiros, J. Park, and A. Kak, “Distributed object tracking using a cluster-based kalman filter in wireless camera networks,” IEEE J. Sel. Topics Signal Process., vol. 2, no. 4, pp. 448–463, 2008.
  19. N. Wang, Y. Song et al., “Unsupervised deep tracking,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog., 2019.
  20. R. Jalil Mozhdehi, Y. Reznichenko, A. Siddique, and H. Medeiros, “Deep convolutional particle filter with adaptive correlation maps for visual tracking,” in 25th IEEE Int. Conf. on Image Processing, 2018.
  21. R. Jalil Mozhdehi and H. Medeiros, “Deep convolutional correlation iterative particle filter for visual tracking,” Computer Vision and Image Understanding, p. 103479, 2022.
  22. M. Chuang, J. Hwang, J. Ye, S. Huang, and K. Williams, “Underwater fish tracking for moving cameras based on deformable multiple kernels,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 47, no. 9, pp. 2467–2477, 2017.
  23. R. Ding, M. Yu, H. Oh, and W. Chen, “New multiple-target tracking strategy using domain knowledge and optimization,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 47, no. 4, pp. 605–616, 2017.
  24. R. Henschel, Y. Zou, and B. Rosenhahn, “Multiple people tracking using body and joint detections,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog. Workshops, 2019.
  25. A. Sadeghian, V. Kosaraju et al., “Sophie: An attentive GAN for predicting paths compliant to social and physical constraints,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog., 2019.
  26. A. Alahi, K. Goel et al., “Social LSTM: Human trajectory prediction in crowded spaces,” in IEEE Conf. on Comp. Vis. and Pattern Recog., 2016.
  27. A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social GAN: Socially acceptable trajectories with generative adversarial networks,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog., 2018.
  28. M. Babaee, A. Athar, and G. Rigoll, “Multiple people tracking using hierarchical deep tracklet re-identification,” arXiv preprint arXiv:1811.04091, 2018.
  29. M. Keuper, S. Tang, B. Andres, T. Brox, and B. Schiele, “Motion segmentation & multiple object tracking by correlation co-clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 1, pp. 140–153, 2020.
  30. S. M. Khan and M. Shah, “A multiview approach to tracking people in crowded scenes using a planar homography constraint,” in European Conf. on Comp. Vis., 2006.
  31. T. Chavdarova, P. Baqué et al., “Wildtrack: A multi-camera HD dataset for dense unscripted pedestrian detection,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog., 2018.
  32. P. Baqué, F. Fleuret, and P. Fua, “Deep occlusion reasoning for multi-camera multi-target detection,” in IEEE Int. Conf. on Comp. Vis., 2017.
  33. A. Zheng, X. Zhang, B. Jiang, B. Luo, and C. Li, “A subspace learning approach to multishot person reidentification,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 50, no. 1, pp. 149–158, 2020.
  34. J. Si, H. Zhang, C.-G. Li, and J. Guo, “Spatial pyramid-based statistical features for person re-identification: A comprehensive evaluation,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 48, no. 7, pp. 1140–1154, 2018.
  35. H.-M. Hsu, T.-W. Huang et al., “Multi-camera tracking of vehicles based on deep features re-id and trajectory-based camera link models,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog. Workshops, 2019.
  36. P. Li, J. Zhang et al., “State-aware re-identification feature for multi-target multi-camera tracking,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog. Workshops, 2019.
  37. N. Peri, P. Khorramshahi et al., “Towards real-time systems for vehicle re-identification, multi-camera tracking, and anomaly detection,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog. Workshops, 2020.
  38. K. Zhang, J. Chen, G. Yu, X. Zhang, and Z. Li, “Visual trajectory tracking of wheeled mobile robots with uncalibrated camera extrinsic parameters,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 51, no. 11, pp. 7191–7200, 2021.
  39. L. Jing and Y. Tian, “Self-supervised visual feature learning with deep neural networks: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 11, pp. 4037–4058, 2021.
  40. M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised learning of visual features,” in European Conf. on Comp. Vis., 2018.
  41. P. Goyal, M. Caron et al., “Self-supervised pretraining of visual features in the wild,” arXiv preprint arXiv:2103.01988, 2021.
  42. Z. Liu, J. Zhang, and L. Liu, “Upright orientation of 3D shapes with convolutional networks,” Graphical Models, vol. 85, pp. 22 – 29, 2016.
  43. M. Taj and A. Cavallaro, “Multi-camera track-before-detect,” in Int. Conf. on Distributed Smart Cameras, 2009.
  44. J. Zhu, H. Yang et al., “Online multi-object tracking with dual matching attention networks,” in European Conf. on Comp. Vis., 2018.
  45. J. Son, M. Baek, M. Cho, and B. Han, “Multi-object tracking with quadruplet convolutional neural networks,” in IEEE Conf. on Comp. Vis. and Pattern Recog., 2017.
  46. S. Sun, N. Akhtar, H. Song, A. Mian, and M. Shah, “Deep affinity network for multiple object tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 104–119, 2021.
  47. H. Sheng, J. Chen et al., “Iterative multiple hypothesis tracking with tracklet-level association,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 12, pp. 3660–3672, 2019.
  48. H. Sheng, Y. Zhang, J. Chen, Z. Xiong, and J. Zhang, “Heterogeneous association graph fusion for target association in multiple object tracking,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 11, pp. 3269–3280, 2019.
  49. C. Kim, F. Li, and J. M. Rehg, “Multi-object tracking with neural gating using bilinear LSTM,” in European Conf. on Comp. Vis., 2018.
  50. K. Fang, Y. Xiang, X. Li, and S. Savarese, “Recurrent autoregressive networks for online multi-object tracking,” in IEEE Winter Conf. on Applications of Comp. Vis., 2018.
  51. S. Schulter, P. Vernaza, W. Choi, and M. Chandraker, “Deep network flow for multi-object tracking,” in IEEE Conf. on Comp. Vis. and Pattern Recog., 2017.
  52. S. Jin, W. Liu, W. Ouyang, and C. Qian, “Multi-person articulated tracking with spatial and temporal embeddings,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog., 2019.
  53. P. Voigtlaender, M. Krause et al., “MOTS: Multi-object tracking and segmentation,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog., 2019.
  54. C. Kim, F. Li, A. Ciptadi, and J. M. Rehg, “Multiple hypothesis tracking revisited,” in IEEE Int. Conf. on Comp. Vis., 2015.
  55. A. Sadeghian, A. Alahi, and S. Savarese, “Tracking the untrackable: Learning to track multiple cues with long-term dependencies,” in IEEE Int. Conf. on Comp. Vis., 2017.
  56. R. Ding, M. Yu, H. Oh, and W.-H. Chen, “New multiple-target tracking strategy using domain knowledge and optimization,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 47, no. 4, pp. 605–616, 2016.
  57. A. Siddique, R. J. Mozhdehi, and H. Medeiros, “Unsupervised spatio-temporal latent feature clustering for multiple-object tracking and segmentation,” in British Mach. Vis. Conf., 2021.
  58. D. Stadler and J. Beyerer, “Improving multiple pedestrian tracking by track management and occlusion handling,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog., 2021.
  59. P. Dendorfer, A. Ošep et al., “MOTChallenge: A benchmark for single-camera multiple target tracking,” Int. J. of Comp. Vis., vol. 129, p. 845–881, 2020.
  60. Y. Xu, X. Liu, Y. Liu, and S.-C. Zhu, “Multi-view people tracking via hierarchical trajectory composition,” in IEEE Conf. on Comp. Vis. and Pattern Recog., 2016.
  61. N. Anjum and A. Cavallaro, “Trajectory association and fusion across partially overlapping cameras,” in IEEE Int. Conf. on Advanced Video and Signal Based Surveillance, 2009.
  62. K. Nithin and F. Bremond, “Multi-camera tracklet association and fusion using ensemble of visual and geometric cues,” IEEE Trans. Circuits Syst. Video Technol., vol. 27, no. 3, pp. 431–440, March 2017.
  63. S. Bae and K. Yoon, “Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning,” in IEEE Conf. on Comp. Vis. and Pattern Recog., 2014.
  64. A. S. Hassanein, M. E. Hussein, and W. Gomaa, “Semantic analysis of crowded scenes based on non-parametric tracklet clustering,” in Int. Joint Conf. on Artificial Intelligence, 2016.
  65. H. Germain, V. Lepetit, and G. Bourmaud, “Neural reprojection error: Merging feature learning and camera pose estimation,” in IEEE/CVF Conf. on Comp. Vis. and Pattern Recog., 2021.
  66. J. Zhang, C. Wang et al., “Content-aware unsupervised deep homography estimation,” in European Conf. on Comp. Vis., 2020.
  67. Y. Bok, D.-G. Choi, P. Vasseur, and I. S. Kweon, “Extrinsic calibration of non-overlapping camera-laser system using structured environment,” in IEEE/RSJ Int. Conf. on Intell. Robots and Syst., 2014, pp. 436–443.
  68. H. Medeiros, H. Iwaki, and J. Park, “Online distributed calibration of a large network of wireless cameras using dynamic clustering,” in Int. Conf. on Distributed Smart Cameras, 2008.
  69. S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in IEEE Conf. on Comp. Vis. and Pattern Recog., 2017.
  70. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conf. on Comp. Vis. and Pattern Recog., 2016.
  71. T.-Y. Lin, M. Maire et al., “Microsoft COCO: Common objects in context,” in European Conf. on Comp. Vis., 2014.
  72. T. Y. Lin, P. Dollár et al., “Feature pyramid networks for object detection,” in IEEE Conf. on Comp. Vis. and Pattern Recog., 2017.
  73. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017.
  74. K. Buchin, M. Buchin, and C. Wenk, “Computing the Fréchet distance between simple polygons,” Computational Geometry, vol. 41, no. 1, pp. 2–20, 2008.
  75. K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: The CLEAR MOT metrics.” EURASIP J. on Image and Video Processing, vol. 2008, 2008.
  76. E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in European Conf. on Comp. Vis. Workshops, 2016.
  77. M. Xu, Z. Zhang et al., “End-to-end semi-supervised object detection with soft teacher,” In IEEE/CVF Int. Conf. on Comp. Vis., 2021.
Citations (3)

Summary

We haven't generated a summary for this paper yet.