Traffic Scene Parsing through the TSP6K Dataset (2303.02835v2)
Abstract: Traffic scene perception in computer vision is a critically important task to achieve intelligent cities. To date, most existing datasets focus on autonomous driving scenes. We observe that the models trained on those driving datasets often yield unsatisfactory results on traffic monitoring scenes. However, little effort has been put into improving the traffic monitoring scene understanding, mainly due to the lack of specific datasets. To fill this gap, we introduce a specialized traffic monitoring dataset, termed TSP6K, containing images from the traffic monitoring scenario, with high-quality pixel-level and instance-level annotations. The TSP6K dataset captures more crowded traffic scenes with several times more traffic participants than the existing driving scenes. We perform a detailed analysis of the dataset and comprehensively evaluate previous popular scene parsing methods, instance segmentation methods and unsupervised domain adaption methods. Furthermore, considering the vast difference in instance sizes, we propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes owing to the proposed TSP6K dataset. Experiments show its effectiveness in parsing the traffic monitoring scenes. Code and dataset are available at https://github.com/PengtaoJiang/TSP6K.
- Aau rainsnow traffic surveillance dataset, 2018.
- Self-supervised augmentation consistency for adapting semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 15384–15394, 2021.
- Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39(12):2481–2495, 2017.
- Yolact: Real-time instance segmentation. In Int. Conf. Comput. Vis., pages 9157–9166, 2019.
- Coco-stuff: Thing and stuff classes in context. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1209–1218, 2018.
- End-to-end object detection with transformers. In Eur. Conf. Comput. Vis., pages 213–229. Springer, 2020.
- Hybrid task cascade for instance segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4974–4983, 2019.
- Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
- Semantic image segmentation with deep convolutional nets and fully connected crfs. In Int. Conf. Learn. Represent., 2015.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell., 40(4):834–848, 2017.
- Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In Eur. Conf. Comput. Vis., pages 801–818, 2018.
- Spgnet: Semantic prediction guidance for scene parsing. In Int. Conf. Comput. Vis., pages 5218–5228, 2019.
- Masked-attention mask transformer for universal image segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1290–1299, 2022.
- Per-pixel classification is not all you need for semantic segmentation. Adv. Neural Inform. Process. Syst., 34:17864–17875, 2021.
- MMSegmentation Contributors. MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark. https://github.com/open-mmlab/mmsegmentation, 2020.
- The cityscapes dataset for semantic urban scene understanding. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3213–3223, 2016.
- Semantic information for robot navigation: A survey. Applied Sciences, 10(2):497, 2020.
- Dark model adaptation: Semantic image segmentation from daytime to nighttime. In Proc. Int. Conf. Intelligent Transportation Systems, pages 3819–3824, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. In Int. Conf. Learn. Represent., 2021.
- Megf-net: multi-exposure generation and fusion network for vehicle detection under dim light conditions. Visual Intelligence 1, 2023.
- The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis., 111(1):98–136, 2015.
- Instances as queries. In Int. Conf. Comput. Vis., pages 6910–6919, 2021.
- Dual attention network for scene segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3146–3154, 2019.
- Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
- Is attention better than matrix decomposition? In Int. Conf. Learn. Represent., 2021.
- Segnext: Rethinking convolutional attention design for semantic segmentation. In Adv. Neural Inform. Process. Syst., 2022.
- Visual attention network. Computational visual media, 2023.
- Attention mechanisms in computer vision: A survey. Computational visual media, 8(3):331–368, 2022.
- Adaptive pyramid context network for semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 7519–7528, 2019.
- Mask r-cnn. In Int. Conf. Comput. Vis., pages 2961–2969, 2017.
- Deep residual learning for image recognition. In IEEE Conf. Comput. Vis. Pattern Recog., pages 770–778, 2016.
- Cycada: Cycle-consistent adversarial domain adaptation. In Int. Conf. Mach. Learn., pages 1989–1998. Pmlr, 2018.
- Conditional generative adversarial network for structured domain adaptation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1335–1344, 2018.
- Conv2former: A simple transformer-style convnet for visual recognition. arXiv preprint arXiv:2211.11943, 2022.
- Strip pooling: Rethinking spatial pooling for scene parsing. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4003–4012, 2020.
- Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 9924–9935, 2022.
- Hrda: Context-aware high-resolution domain-adaptive semantic segmentation. In Eur. Conf. Comput. Vis., pages 372–391. Springer, 2022.
- The apolloscape open dataset for autonomous driving and its application. IEEE Trans. Pattern Anal. Mach. Intell., 42(10):2702–2719, 2019.
- Mask scoring r-cnn. In IEEE Conf. Comput. Vis. Pattern Recog., pages 6409–6418, 2019.
- Ccnet: Criss-cross attention for semantic segmentation. In Int. Conf. Comput. Vis., pages 603–612, 2019.
- Navigation-oriented scene understanding for robotic autonomy: Learning to segment driveability in egocentric images. IEEE Robotics and Automation Letters, 7(2):2913–2920, 2022.
- Deep learning-based moving object segmentation: Recent progress and research prospects. Machine Intelligence Research, 20(3):335–369, 2023.
- Predicting scene parsing and motion dynamics in the future. In Adv. Neural Inform. Process. Syst., volume 30, 2017.
- Urban tracker: Multiple object tracking in urban mixed traffic. In IEEE Winter Conf. Appl. Comput. Vis., pages 885–892. IEEE, 2014.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Class-balanced pixel-level self-labeling for domain adaptive semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 11593–11603, 2022.
- Expectation-maximization attention networks for semantic segmentation. In Int. Conf. Comput. Vis., pages 9167–9176, 2019.
- Ar-cnn: an attention ranking network for learning urban perception. Science China Information Sciences, 65(1):112104, 2022.
- Constructing self-motivated pyramid curriculums for cross-domain semantic segmentation: A non-adversarial approach. In Int. Conf. Comput. Vis., pages 6758–6767, 2019.
- Dynamic-structured semantic propagation network. In IEEE Conf. Comput. Vis. Pattern Recog., pages 752–761, 2018.
- Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1925–1934, 2017.
- Feature pyramid networks for object detection. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2117–2125, 2017.
- Microsoft coco: Common objects in context. In Eur. Conf. Comput. Vis., 2014.
- Path aggregation network for instance segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 8759–8768, 2018.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Int. Conf. Comput. Vis., pages 10012–10022, 2021.
- Fully convolutional networks for semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3431–3440, 2015.
- Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems, 16(2):865–873, 2014.
- Pixmatch: Unsupervised domain adaptation via pixelwise consistency training. In IEEE Conf. Comput. Vis. Pattern Recog., pages 12435–12445, 2021.
- The mapillary vistas dataset for semantic understanding of street scenes. In Int. Conf. Comput. Vis., pages 4990–4999, 2017.
- Playing for data: Ground truth from computer games. In Eur. Conf. Comput. Vis., pages 102–118. Springer, 2016.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
- The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3234–3243, 2016.
- Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In Int. Conf. Comput. Vis., pages 7374–7383, 2019.
- Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In Int. Conf. Comput. Vis., pages 10765–10775, 2021.
- Segmenter: Transformer for semantic segmentation. In Int. Conf. Comput. Vis., pages 7262–7272, 2021.
- Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. In IEEE Conf. Comput. Vis. Pattern Recog., pages 8797–8806, 2019.
- Conditional convolutions for instance segmentation. In Eur. Conf. Comput. Vis., pages 282–298. Springer, 2020.
- Going deeper with image transformers. In Int. Conf. Comput. Vis., pages 32–42, 2021.
- Learning to adapt structured output space for semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 7472–7481, 2018.
- Domain adaptation for structured output via discriminative patch representations. In Int. Conf. Comput. Vis., pages 1456–1465, 2019.
- Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In IEEE Winter Conf. Appl. Comput. Vis., pages 1743–1751. IEEE, 2019.
- Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2517–2526, 2019.
- Classes matter: A fine-grained adversarial approach to cross-domain semantic segmentation. In Eur. Conf. Comput. Vis., pages 642–659. Springer, 2020.
- Solo: Segmenting objects by locations. In Eur. Conf. Comput. Vis., pages 649–665. Springer, 2020.
- Solov2: Dynamic and fast instance segmentation. Adv. Neural Inform. Process. Syst., 33:17721–17732, 2020.
- Uncertainty-aware pseudo label refinery for domain adaptive semantic segmentation. In Int. Conf. Comput. Vis., pages 9092–9101, 2021.
- Yolop: You only look once for panoptic driving perception. Machine Intelligence Research, 19(6):550–562, 2022.
- Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation. In arXiv preprint arXiv:1903.11816, 2019.
- Unified perceptual parsing for scene understanding. In Eur. Conf. Comput. Vis., pages 418–434, 2018.
- Sepico: Semantic-guided pixel contrast for domain adaptive semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 2023.
- Polarmask: Single shot instance segmentation with polar representation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 12193–12202, 2020.
- Segformer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inform. Process. Syst., 34:12077–12090, 2021.
- Fda: Fourier domain adaptation for semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4085–4095, 2020.
- Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis., pages 1–18, 2021.
- Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Eur. Conf. Comput. Vis., pages 325–341, 2018.
- Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2636–2645, 2020.
- Object-contextual representations for semantic segmentation. In Eur. Conf. Comput. Vis., 2020.
- Ocnet: Object context for semantic segmentation. Int. J. Comput. Vis., 129(8):2375–2398, 2021.
- Wilddash-creating hazard-aware benchmarks. In Eur. Conf. Comput. Vis., pages 402–416, 2018.
- Unifying panoptic segmentation for autonomous driving. In IEEE Conf. Comput. Vis. Pattern Recog., pages 21351–21360, 2022.
- Context encoding for semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 7151–7160, 2018.
- Category anchor-guided unsupervised domain adaptation for semantic segmentation. Adv. Neural Inform. Process. Syst., 32, 2019.
- Scale-adaptive convolutions for scene parsing. In Int. Conf. Comput. Vis., pages 2031–2039, 2017.
- Topformer: Token pyramid transformer for mobile semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 12083–12093, 2022.
- K-net: Towards unified image segmentation. In Adv. Neural Inform. Process. Syst., volume 34, pages 10326–10338, 2021.
- Icnet for real-time semantic segmentation on high-resolution images. In Eur. Conf. Comput. Vis., pages 405–420, 2018.
- Pyramid scene parsing network. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2881–2890, 2017.
- Psanet: Point-wise spatial attention network for scene parsing. In Eur. Conf. Comput. Vis., pages 267–283, 2018.
- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In IEEE Conf. Comput. Vis. Pattern Recog., pages 6881–6890, 2021.
- Scene parsing through ade20k dataset. In IEEE Conf. Comput. Vis. Pattern Recog., pages 633–641, 2017.
- Asymmetric non-local neural networks for semantic segmentation. In Int. Conf. Comput. Vis., pages 593–602, 2019.
- Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In Eur. Conf. Comput. Vis., pages 289–305, 2018.
- Confidence regularized self-training. In Int. Conf. Comput. Vis., pages 5982–5991, 2019.