Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark (2403.20225v1)

Published 29 Mar 2024 in cs.CV

Abstract: Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras. This task has practical applications in various fields, such as visual surveillance, crowd behavior analysis, and anomaly detection. However, due to the difficulty and cost of collecting and labeling data, existing datasets for this task are either synthetically generated or artificially constructed within a controlled camera network setting, which limits their ability to model real-world dynamics and generalize to diverse camera configurations. To address this issue, we present MTMMC, a real-world, large-scale dataset that includes long video sequences captured by 16 multi-modal cameras in two different environments - campus and factory - across various time, weather, and season conditions. This dataset provides a challenging test-bed for studying multi-camera tracking under diverse real-world complexities and includes an additional input modality of spatially aligned and temporally synchronized RGB and thermal cameras, which enhances the accuracy of multi-camera tracking. MTMMC is a super-set of existing datasets, benefiting independent fields such as person detection, re-identification, and multiple object tracking. We provide baselines and new learning setups on this dataset and set the reference scores for future studies. The datasets, models, and test server will be made publicly available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (94)
  1. Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1165–1174, 2019.
  2. Bot-sort: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651, 2022.
  3. Action recognition from thermal videos. IEEE Access, 7:103893–103917, 2019.
  4. Multiple object tracking using k-shortest paths optimization. IEEE transactions on pattern analysis and machine intelligence, 33(9):1806–1819, 2011.
  5. Correlation clustering: from theory to practice. In KDD, page 1972, 2014.
  6. Data association for multi-object tracking-by-detection in multi-camera networks. In 2012 Sixth International Conference on Distributed Smart Cameras (ICDSC), pages 1–6. IEEE, 2012.
  7. Observation-centric sort: Rethinking sort for robust multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9686–9696, 2023.
  8. An equalised global graphical model-based approach for multi-camera object tracking. arXiv preprint arXiv:1502.03532, 8, 2015.
  9. The wildtrack multi-camera person dataset. arXiv preprint arXiv:1707.09299, 2017.
  10. Cross-view tracking for multi-human 3d pose estimation at over 100 fps. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3279–3288, 2020.
  11. MMTracking Contributors. MMTracking: OpenMMLab video perception toolbox and benchmark. https://github.com/open-mmlab/mmtracking, 2020.
  12. Consistent re-identification in a camera network. In European conference on computer vision, pages 330–345. Springer, 2014.
  13. Tao: A large-scale benchmark for tracking any object. In European conference on computer vision, pages 436–454. Springer, 2020.
  14. Distributed video acquisition and annotation for sport-event summarization. NEM summit, 8, 2008.
  15. Rfbnet: deep multimodal networks with residual fusion blocks for rgb-d semantic segmentation. arXiv preprint arXiv:1907.00135, 2019.
  16. Pedestrian detection: A benchmark. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 304–311, 2009.
  17. A semi-automatic system for ground truth generation of soccer video sequences. In 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, pages 559–564. IEEE, 2009.
  18. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision, pages 2650–2658, 2015.
  19. A mobile vision system for robust multi-person tracking. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008.
  20. Motsynth: How can synthetic data help pedestrian detection and tracking? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10849–10859, 2021.
  21. Detect to track and track to detect. In Proceedings of the IEEE international conference on computer vision, pages 3038–3046, 2017.
  22. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence, 32(9):1627–1645, 2010.
  23. Pets2009: Dataset and challenge. In 2009 Twelfth IEEE international workshop on performance evaluation of tracking and surveillance, pages 1–6. IEEE, 2009.
  24. Multicamera people tracking with a probabilistic occupancy map. IEEE transactions on pattern analysis and machine intelligence, 30(2):267–282, 2007.
  25. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
  26. Mmptrack: Large-scale densely annotated multi-camera multiple people tracking benchmark. arXiv preprint arXiv:2111.15157, 2021.
  27. Fastreid: A pytorch toolbox for general instance re-identification. arXiv preprint arXiv:2006.02631, 2020a.
  28. Multi-target multi-camera tracking by tracklet-to-target assignment. IEEE Transactions on Image Processing, 29:5191–5205, 2020b.
  29. Adaptive affinity for associations in multi-target multi-camera tracking. IEEE Transactions on Image Processing, 31:612–622, 2021.
  30. Joint monocular 3d vehicle detection and tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5390–5399, 2019.
  31. Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1037–1045, 2015.
  32. Online inter-camera trajectory association exploiting person re-identification and camera topology. In Proceedings of the 26th ACM international conference on Multimedia, pages 1457–1465, 2018.
  33. How to make an rgbd tracker? In proceedings of the european conference on computer vision (ECCV) Workshops, pages 0–0, 2018.
  34. Object tracking by reconstruction with view-specific discriminative correlation filters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1339–1348, 2019.
  35. The mta dataset for multi-target multi-camera pedestrian tracking by weighted distance aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1042–1043, 2020.
  36. The eighth visual object tracking vot2020 challenge results. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pages 547–601. Springer, 2020.
  37. Inter-camera association of multi-target tracks by on-line learned appearance affinity models. In European conference on computer vision, pages 383–396. Springer, 2010.
  38. Learning by tracking: Siamese cnn for robust target association. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 33–40, 2016.
  39. Illumination-aware faster r-cnn for robust multispectral pedestrian detection. Pattern Recognition, 85:161–171, 2019.
  40. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
  41. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017a.
  42. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017b.
  43. Context-aware three-dimensional mean-shift with occlusion handling for robust object tracking in rgb-d videos. IEEE Transactions on Multimedia, 21(3):664–677, 2018.
  44. Hota: A higher order metric for evaluating multi-object tracking. International journal of computer vision, 129(2):548–578, 2021.
  45. Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 0–0, 2019.
  46. Graph distillation for action detection with privileged modalities. In Proceedings of the European Conference on Computer Vision (ECCV), pages 166–183, 2018.
  47. Trackformer: Multi-object tracking with transformers. arXiv preprint arXiv:2101.02702, 2021.
  48. Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831, 2016.
  49. Quasi-dense similarity learning for multiple object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 164–173, 2021.
  50. Bam: Bottleneck attention module. arXiv preprint arXiv:1807.06514, 2018.
  51. Per-clip video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1352–1361, 2022.
  52. Dyglip: A dynamic graph model with link prediction for accurate multi-camera multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13784–13793, 2021.
  53. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017.
  54. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  55. Features for multi-target multi-camera tracking and re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6036–6046, 2018.
  56. Performance measures and a data set for multi-target, multi-camera tracking. In European conference on computer vision, pages 17–35. Springer, 2016.
  57. Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123, 2018.
  58. Self-supervised monocular depth estimation from thermal images via adversarial multi-spectral adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5798–5807, 2023.
  59. Distributed person of interest tracking in camera networks. In Proceedings of the 11th International Conference on Distributed Smart Cameras, pages 131–137, 2017.
  60. Transtrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460, 2020a.
  61. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020b.
  62. Deep affinity network for multiple object tracking. IEEE transactions on pattern analysis and machine intelligence, 43(1):104–119, 2019a.
  63. Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes. IEEE Robotics and Automation Letters, 4(3):2576–2583, 2019b.
  64. Deepsportradar-v1: Computer vision dataset for sports understanding with high quality annotations. In Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports, pages 1–8, 2022.
  65. Unidentified video objects: A benchmark for dense, open-world segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10776–10785, 2021.
  66. Towards real-time multi-object tracking. In European Conference on Computer Vision, pages 107–122. Springer, 2020a.
  67. Towards real-time multi-object tracking. The European Conference on Computer Vision (ECCV), 2020b.
  68. Step: Segmenting and tracking every pixel. arXiv preprint arXiv:2102.11859, 2021.
  69. Person transfer gan to bridge domain gap for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 79–88, 2018.
  70. Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP), pages 3645–3649. IEEE, 2017.
  71. Bridging images and videos: A simple learning framework for large vocabulary video object detection. In European Conference on Computer Vision, pages 238–258. Springer, 2022a.
  72. Tracking by associating clips. In European Conference on Computer Vision, pages 129–145. Springer, 2022b.
  73. Track to detect and segment: An online multi-object tracker. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12352–12361, 2021.
  74. Joint detection and identification feature learning for person search. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3415–3424, 2017.
  75. Learning cross-modal deep representations for robust pedestrian detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5363–5371, 2017.
  76. Depthtrack: Unveiling the power of rgbd tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10725–10733, 2021.
  77. Deep learning for person re-identification: A survey and outlook. IEEE transactions on pattern analysis and machine intelligence, 44(6):2872–2893, 2021.
  78. Real-time 3d deep multi-camera tracking. arXiv preprint arXiv:2003.11753, 2020.
  79. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636–2645, 2020.
  80. Motr: End-to-end multiple-object tracking with transformer. arXiv preprint arXiv:2105.03247, 2021.
  81. Cross-modality interactive attention network for multispectral pedestrian detection. Information Fusion, 50:20–29, 2019a.
  82. Weakly aligned cross-modal learning for multispectral pedestrian detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5127–5137, 2019b.
  83. A camera network tracking (camnet) dataset and performance baseline. In 2015 IEEE Winter Conference on Applications of Computer Vision, pages 365–372. IEEE, 2015.
  84. Citypersons: A diverse dataset for pedestrian detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3221, 2017a.
  85. Bytetrack: Multi-object tracking by associating every detection box. arXiv preprint arXiv:2110.06864, 2021.
  86. Multi-target, multi-camera tracking by hierarchical clustering: Recent progress on dukemtmc project. arXiv preprint arXiv:1712.09531, 2017b.
  87. Knowledge as priors: Cross-modal knowledge generalization for datasets without superior knowledge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6528–6537, 2020.
  88. Scalable person re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision, pages 1116–1124, 2015.
  89. Person re-identification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1367–1376, 2017a.
  90. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In Proceedings of the IEEE international conference on computer vision, pages 3754–3762, 2017b.
  91. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1318–1327, 2017.
  92. Mffenet: Multiscale feature fusion and enhancement network for rgb–thermal urban road scene parsing. IEEE Transactions on Multimedia, 24:2526–2538, 2021a.
  93. Gmnet: graded-feature multilabel-learning network for rgb-thermal urban scene semantic segmentation. IEEE Transactions on Image Processing, 30:7790–7802, 2021b.
  94. Tracking objects as points. In European Conference on Computer Vision, pages 474–490. Springer, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sanghyun Woo (31 papers)
  2. Kwanyong Park (15 papers)
  3. Inkyu Shin (19 papers)
  4. Myungchul Kim (6 papers)
  5. In So Kweon (156 papers)

Summary

We haven't generated a summary for this paper yet.